Documentation for `aalibrary.utils`

The utils submodule provides powerful and necessary functions for interacting with cloud providers. These functions include obtaining meta-metadata about the data files, such as how many files exist in a particular cruise in NCEI and more...

Modules:

Name	Description
`cloud_utils`	This file contains all utility functions for Active Acoustics.
`discrepancies`	This file is used to identify discrepancies between what data exists on
`frequency_data`	This module contains the FrequencyData class.
`gcp_utils`	This file contains code pertaining to auxiliary functions related to parsing
`helpers`	For helper functions.
`ices`
`nc_reader`	This file is used to get header information out of a NetCDF file. The
`ncei_cache_daily_script`	Script to get all objects in the NCEI S3 bucket and cache it to BigQuery.
`ncei_utils`	This file contains code pertaining to auxiliary functions related to parsing
`sonar_checker`
`timings`	"This script deals with the times associated with ingesting/preprocessing

`cloud_utils`

This file contains all utility functions for Active Acoustics.

Functions:

Name	Description
`bq_query_to_pandas`	Takes a SQL query and returns the end result as a DataFrame.
`check_existence_of_supplemental_files`	Checks the existence of supplemental files (idx, bot, etc.) for a raw
`check_if_file_exists_in_gcp`	Checks whether a particular file exists in GCP using the file path
`check_if_file_exists_in_s3`	Checks to see if a file exists in an s3 bucket. Intended for use with
`check_if_netcdf_file_exists_in_gcp`	Checks if a netcdf file exists in GCP storage. If the bucket location is
`count_objects_in_s3_bucket_location`	Counts the number of objects within a bucket location.
`count_subdirectories_in_s3_bucket_location`	Counts the number of subdirectories within a bucket location.
`create_s3_objs`	Creates the s3 objects needed for using boto3 for a particular bucket.
`delete_file_from_gcp`	Deletes a file from the storage bucket.
`download_file_from_gcp`	Downloads a file from the blob storage bucket.
`download_file_from_gcp_as_string`	Downloads a file from the blob storage bucket as a text string.
`get_data_lake_directory_client`	Creates a data lake directory client. Returns an object of type
`get_object_key_for_s3`	Creates an object key for a file within s3 given the parameters above.
`get_service_client_sas`	Gets an azure service client using an SAS (shared access signature)
`get_subdirectories_in_s3_bucket_location`	Gets a list of all the subdirectories in a specific bucket location
`list_all_folders_in_gcp_bucket_location`	Lists all of the folders in a GCP storage bucket location.
`list_all_objects_in_gcp_bucket_location`	Gets all of the files within a GCP storage bucket location.
`list_all_objects_in_s3_bucket_location`	Lists all of the objects in a s3 bucket location denoted by `prefix`.
`setup_gbq_client_objs`	Sets up Google Big Query client objects used to execute queries and
`setup_gcp_storage_objs`	Sets up Google Cloud Platform storage objects for use in accessing and
`upload_file_to_gcp_bucket`	Uploads a file to the blob storage bucket.

`bq_query_to_pandas(client=None, query='')`

Takes a SQL query and returns the end result as a DataFrame.

Source code in src\aalibrary\utils\cloud_utils.py

def bq_query_to_pandas(client: bigquery.Client = None, query: str = ""):
    """Takes a SQL query and returns the end result as a DataFrame."""

    job = client.query(query)
    return job.result().to_dataframe()

`check_existence_of_supplemental_files(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', debug=False)`

Checks the existence of supplemental files (idx, bot, etc.) for a raw file. Will check for existence in all data sources.

Parameters:

Name	Type	Description	Default
`file_name`	`str`	The file name (includes extension). Defaults to "".	`''`
`file_type`	`str`	The file type (do not include the dot "."). Defaults to "".	`'raw'`
`ship_name`	`str`	The ship name associated with this survey. Defaults to "".	`''`
`survey_name`	`str`	The survey name/identifier. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used to gather the data. Defaults to "".	`''`
`debug`	`bool`	Whether or not to print debug statements. Defaults to False.	`False`

Returns:

Name	Type	Description
`RawFile`	`RawFile`	Returns a RawFile object, existence can be accessed as a boolean via the variable within. Ex. rf.idx_file_exists_in_ncei

Source code in src\aalibrary\utils\cloud_utils.py

def check_existence_of_supplemental_files(
    file_name: str = "",
    file_type: str = "raw",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    debug: bool = False,
) -> RawFile:
    """Checks the existence of supplemental files (idx, bot, etc.) for a raw
    file. Will check for existence in all data sources.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (do not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        RawFile: Returns a RawFile object, existence can be accessed as a
            boolean via the variable within.
            Ex. rf.idx_file_exists_in_ncei
    """

    # Create connection vars
    gcp_stor_client, gcp_bucket_name, gcp_bucket = setup_gcp_storage_objs()
    _, s3_resource, _ = create_s3_objs()

    # Create the RawFile object.
    rf = RawFile(
        file_name=file_name,
        file_type=file_type,
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        debug=debug,
        gcp_bucket=gcp_bucket,
        gcp_bucket_name=gcp_bucket_name,
        gcp_stor_client=gcp_stor_client,
        s3_resource=s3_resource,
    )

    return rf

`check_if_file_exists_in_gcp(bucket=None, file_path='')`

Checks whether a particular file exists in GCP using the file path (blob).

Parameters:

Name	Type	Description	Default
`bucket`	`Bucket`	The bucket object used to check for the file. Defaults to None.	`None`
`file_path`	`str`	The blob file path within the bucket. Defaults to "".	`''`

Returns:

Name	Type	Description
`Bool`	`bool`	True if the file already exists, False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py

def check_if_file_exists_in_gcp(
    bucket: storage.Bucket = None, file_path: str = ""
) -> bool:
    """Checks whether a particular file exists in GCP using the file path
    (blob).

    Args:
        bucket (storage.Bucket, optional): The bucket object used to check for
            the file. Defaults to None.
        file_path (str, optional): The blob file path within the bucket.
            Defaults to "".

    Returns:
        Bool: True if the file already exists, False otherwise.
    """

    return bucket.blob(file_path).exists()

`check_if_file_exists_in_s3(object_key='', s3_resource=None, s3_bucket_name='')`

Checks to see if a file exists in an s3 bucket. Intended for use with NCEI, but will work with other s3 buckets as well.

Parameters:

Name	Type	Description	Default
`object_key`	`str`	The object key (location of the object). Defaults to "".	`''`
`s3_resource`	`resource`	The boto3 resource for this particular bucket. Defaults to None.	`None`
`s3_bucket_name`	`str`	The bucket name. Defaults to "".	`''`

Returns:

Name	Type	Description
`bool`	`bool`	True if the file exists within the bucket. False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py

def check_if_file_exists_in_s3(
    object_key: str = "",
    s3_resource: boto3.resource = None,
    s3_bucket_name: str = "",
) -> bool:
    """Checks to see if a file exists in an s3 bucket. Intended for use with
    NCEI, but will work with other s3 buckets as well.

    Args:
        object_key (str, optional): The object key (location of the object).
            Defaults to "".
        s3_resource (boto3.resource, optional): The boto3 resource for this
            particular bucket. Defaults to None.
        s3_bucket_name (str, optional): The bucket name. Defaults to "".

    Returns:
        bool: True if the file exists within the bucket. False otherwise.
    """

    try:
        s3_resource.Object(s3_bucket_name, object_key).load()
        return True
    except Exception:
        # object key does not exist.
        # print(e)
        return False

`check_if_netcdf_file_exists_in_gcp(file_name='', ship_name='', survey_name='', echosounder='', data_source='', gcp_storage_bucket_location='', gcp_bucket=None, debug=False)`

Checks if a netcdf file exists in GCP storage. If the bucket location is not specified, it will use the helpers to parse the correct location.

Parameters:

Name	Type	Description	Default
`file_name`	`str`	The file name (includes extension). Defaults to "".	`''`
`ship_name`	`str`	The ship name associated with this survey. Defaults to "".	`''`
`survey_name`	`str`	The survey name/identifier. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used to gather the data. Defaults to "".	`''`
`data_source`	`str`	The source of the file. Necessary due to the way the storage bucket is organized. Can be one of ["NCEI", "OMAO", "HDD"]. Defaults to "".	`''`
`gcp_storage_bucket_location`	`str`	The string representing the blob's location within the storage bucket. Defaults to "".	`''`
`gcp_bucket`	`Bucket`	The bucket object used for downloading.	`None`
`debug`	`bool`	Whether or not to print debug statements. Defaults to False.	`False`

Returns:

Name	Type	Description
`bool`	`bool`	True if the file exists in GCP, False otherwise.

Source code in src\aalibrary\utils\cloud_utils.py

def check_if_netcdf_file_exists_in_gcp(
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    gcp_storage_bucket_location: str = "",
    gcp_bucket: storage.Bucket = None,
    debug: bool = False,
) -> bool:
    """Checks if a netcdf file exists in GCP storage. If the bucket location is
    not specified, it will use the helpers to parse the correct location.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the file. Necessary due to
            the way the storage bucket is organized. Can be one of
            ["NCEI", "OMAO", "HDD"]. Defaults to "".
        gcp_storage_bucket_location (str, optional): The string representing
            the blob's location within the storage bucket. Defaults to "".
        gcp_bucket (storage.Bucket): The bucket object used for downloading.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        bool: True if the file exists in GCP, False otherwise.
    """

    if gcp_storage_bucket_location != "":
        gcp_storage_bucket_location = (
            helpers.parse_correct_gcp_storage_bucket_location(
                file_name=file_name,
                file_type="netcdf",
                survey_name=survey_name,
                ship_name=ship_name,
                echosounder=echosounder,
                data_source=data_source,
                is_metadata=False,
                debug=debug,
            )
        )
    netcdf_gcp_storage_bucket_location = (
        get_netcdf_gcp_location_from_raw_gcp_location(
            gcp_storage_bucket_location=gcp_storage_bucket_location
        )
    )
    # check if the file exists in gcp
    return check_if_file_exists_in_gcp(
        bucket=gcp_bucket, file_path=netcdf_gcp_storage_bucket_location
    )

`count_objects_in_s3_bucket_location(prefix='', bucket=None)`

Counts the number of objects within a bucket location. NOTE: This DOES NOT include folders, as those do not count as objects.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	The bucket location. Defaults to "".	`''`
`bucket`	`resource`	The bucket resource object. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The count of objects within the location.

Source code in src\aalibrary\utils\cloud_utils.py

def count_objects_in_s3_bucket_location(
    prefix: str = "", bucket: boto3.resource = None
) -> int:
    """Counts the number of objects within a bucket location.
    NOTE: This DOES NOT include folders, as those do not count as objects.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.

    Returns:
        int: The count of objects within the location.
    """

    count = sum(1 for _ in bucket.objects.filter(Prefix=prefix).all())
    return count

`count_subdirectories_in_s3_bucket_location(prefix='', bucket=None)`

Counts the number of subdirectories within a bucket location.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	The bucket location. Defaults to "".	`''`
`bucket`	`resource`	The bucket resource object. Defaults to None.	`None`

Returns:

Name	Type	Description
`int`	`int`	The count of subdirectories within the location.

Source code in src\aalibrary\utils\cloud_utils.py

def count_subdirectories_in_s3_bucket_location(
    prefix: str = "", bucket: boto3.resource = None
) -> int:
    """Counts the number of subdirectories within a bucket location.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.

    Returns:
        int: The count of subdirectories within the location.
    """

    subdirs = set()
    for obj in bucket.objects.filter(Prefix=prefix):
        prefix = "/".join(obj.key.split("/")[:-1])
        if prefix and prefix not in subdirs:
            subdirs.add(prefix)
            # print(prefix + "/")
    return len(subdirs)

`create_s3_objs(bucket_name='noaa-wcsd-pds')`

Creates the s3 objects needed for using boto3 for a particular bucket.

Parameters:

Name	Type	Description	Default
`bucket_name`	`str`	The bucket you want to refer to. The default points to the NCEI bucket. Defaults to "noaa-wcsd-pds".	`'noaa-wcsd-pds'`

Returns:

Name	Type	Description
`Tuple`	`Tuple`	The s3 client (used for certain portions of the boto3 api), the s3 resource (newer, more used object for accessing s3 buckets), and the actual s3 bucket itself.

Source code in src\aalibrary\utils\cloud_utils.py

def create_s3_objs(bucket_name: str = "noaa-wcsd-pds") -> Tuple:
    """Creates the s3 objects needed for using boto3 for a particular bucket.

    Args:
        bucket_name (str, optional): The bucket you want to refer to. The
            default points to the NCEI bucket. Defaults to "noaa-wcsd-pds".

    Returns:
        Tuple: The s3 client (used for certain portions of the boto3 api), the
            s3 resource (newer, more used object for accessing s3 buckets), and
            the actual s3 bucket itself.
    """

    # Setup access to S3 bucket as an anonymous user
    s3_client = boto3.client(
        "s3",
        aws_access_key_id="",
        aws_secret_access_key="",
        config=Config(signature_version=UNSIGNED),
    )
    s3_resource = boto3.resource(
        "s3",
        aws_access_key_id="",
        aws_secret_access_key="",
        config=Config(signature_version=UNSIGNED),
    )

    s3_bucket = s3_resource.Bucket(bucket_name)

    return s3_client, s3_resource, s3_bucket

`delete_file_from_gcp(gcp_bucket, blob_file_path)`

Deletes a file from the storage bucket.

Parameters:

Name	Type	Description	Default
`gcp_bucket`	`bucket`	The bucket object used for downloading from.	required
`blob_file_path`	`str`	The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.	required

Raises: AssertionError: If the file does not exist in GCP. Exception: If there is an error deleting the file.

Source code in src\aalibrary\utils\cloud_utils.py

def delete_file_from_gcp(
    gcp_bucket: storage.Client.bucket, blob_file_path: str
):
    """Deletes a file from the storage bucket.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
    Raises:
        AssertionError: If the file does not exist in GCP.
        Exception: If there is an error deleting the file.
    """

    file_exists_in_gcp = check_if_file_exists_in_gcp(
        gcp_bucket, blob_file_path
    )
    assert (
        file_exists_in_gcp
    ), f"File does not exist in GCP at `{blob_file_path}`."

    blob = gcp_bucket.blob(blob_file_path)
    try:
        blob.delete()
        return
    except Exception:
        print(traceback.format_exc())
        raise

`download_file_from_gcp(gcp_bucket, blob_file_path, local_file_path, debug=False)`

Downloads a file from the blob storage bucket.

Parameters:

Name	Type	Description	Default
`gcp_bucket`	`bucket`	The bucket object used for downloading from.	required
`blob_file_path`	`str`	The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.	required
`local_file_path`	`str`	The local file path you wish to download the blob to.	required
`debug`	`bool`	Whether or not to print debug statements.	`False`

Source code in src\aalibrary\utils\cloud_utils.py

def download_file_from_gcp(
    gcp_bucket: storage.Client.bucket,
    blob_file_path: str,
    local_file_path: str,
    debug: bool = False,
):
    """Downloads a file from the blob storage bucket.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
        local_file_path (str): The local file path you wish to download the
            blob to.
        debug (bool): Whether or not to print debug statements.
    """

    blob = gcp_bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Download from blob
    try:
        blob.download_to_filename(local_file_path)
        if debug:
            print(f"New data downloaded to {local_file_path}")
    except Exception:
        print(traceback.format_exc())
        raise

`download_file_from_gcp_as_string(gcp_bucket, blob_file_path)`

Downloads a file from the blob storage bucket as a text string.

Parameters:

Name	Type	Description	Default
`gcp_bucket`	`bucket`	The bucket object used for downloading from.	required
`blob_file_path`	`str`	The blob's file path. Ex. "data/itds/logs/execute_rasp_ii/temp.csv" NOTE: This must include the file name as well as the extension.	required

Source code in src\aalibrary\utils\cloud_utils.py

def download_file_from_gcp_as_string(
    gcp_bucket: storage.Client.bucket,
    blob_file_path: str,
):
    """Downloads a file from the blob storage bucket as a text string.

    Args:
        gcp_bucket (storage.Client.bucket): The bucket object used for
            downloading from.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_rasp_ii/temp.csv"
            NOTE: This must include the file name as well as the extension.
    """

    blob = gcp_bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Download from blob
    try:
        return blob.download_as_text(encoding='utf-8')
    except Exception:
        print(traceback.format_exc())
        raise

`get_data_lake_directory_client(config_file_path='')`

Creates a data lake directory client. Returns an object of type DataLakeServiceClient.

Parameters:

Name	Type	Description	Default
`config_file_path`	`str`	The location of the config file. Needs a `[DEFAULT]` section with a `azure_connection_string` variable defined. Defaults to "".	`''`

Returns:

Name	Type	Description
`DataLakeServiceClient`	`DataLakeServiceClient`	An object of type DataLakeServiceClient, with connection to the connection string described in the config.

Source code in src\aalibrary\utils\cloud_utils.py

def get_data_lake_directory_client(
    config_file_path: str = "",
) -> DataLakeServiceClient:
    """Creates a data lake directory client. Returns an object of type
    DataLakeServiceClient.

    Args:
        config_file_path (str, optional): The location of the config file.
            Needs a `[DEFAULT]` section with a `azure_connection_string`
            variable defined. Defaults to "".

    Returns:
        DataLakeServiceClient: An object of type DataLakeServiceClient, with
            connection to the connection string described in the config.
    """

    config = configparser.ConfigParser()
    config.read(config_file_path)

    azure_service = DataLakeServiceClient.from_connection_string(
        conn_str=config["DEFAULT"]["azure_connection_string"]
    )

    return azure_service

`get_object_key_for_s3(file_url='', file_name='', ship_name='', survey_name='', echosounder='')`

Creates an object key for a file within s3 given the parameters above.

Parameters:

Name	Type	Description	Default
`file_url`	`str`	The entire url to the file resource in s3. Starts with "https://" or "s3://". Defaults to "". NOTE: If this is specified, there is no need to provide the other parameters.	`''`
`file_name`	`str`	The file name (includes extension). Defaults to "".	`''`
`ship_name`	`str`	The ship name associated with this survey. Defaults to "".	`''`
`survey_name`	`str`	The survey name/identifier. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used to gather the data. Defaults to "".	`''`

Source code in src\aalibrary\utils\cloud_utils.py

def get_object_key_for_s3(
    file_url: str = "",
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
):
    """Creates an object key for a file within s3 given the parameters above.

    Args:
        file_url (str, optional): The entire url to the file resource in s3.
            Starts with "https://" or "s3://". Defaults to "".
            NOTE: If this is specified, there is no need to provide the other
            parameters.
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
    """

    if file_url:
        # We replace the beginning of common file paths
        file_url = file_url.replace(
            "https://noaa-wcsd-pds.s3.amazonaws.com/", ""
        )
        file_url = file_url.replace("s3://noaa-wcsd-pds/", "")
        return file_url
    else:
        # We default to using the parameters to create an object key according
        # to NCEI standards.
        object_key = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        return object_key

`get_service_client_sas(account_name, sas_token)`

Gets an azure service client using an SAS (shared access signature) token. The token must be created in Azure.

Parameters:

Name	Type	Description	Default
`account_name`	`str`	The name of the account you are trying to create a service client with. This is usually a storage account that is attached to the container.	required
`sas_token`	`str`	The complete SAS token.	required

Returns:

Name	Type	Description
`DataLakeServiceClient`	`DataLakeServiceClient`	An object of type DataLakeServiceClient, with connection to the container/file the SAS allows access to.

Source code in src\aalibrary\utils\cloud_utils.py

def get_service_client_sas(
    account_name: str, sas_token: str
) -> DataLakeServiceClient:
    """Gets an azure service client using an SAS (shared access signature)
    token. The token must be created in Azure.

    Args:
        account_name (str): The name of the account you are trying to create a
            service client with. This is usually a storage account that is
            attached to the container.
        sas_token (str): The complete SAS token.

    Returns:
        DataLakeServiceClient: An object of type DataLakeServiceClient, with
            connection to the container/file the SAS allows access to.
    """
    account_url = f"https://{account_name}.dfs.core.windows.net"

    # The SAS token string can be passed in as credential param or appended to
    # the account URL
    service_client = DataLakeServiceClient(account_url, credential=sas_token)

    return service_client

`get_subdirectories_in_s3_bucket_location(prefix='', s3_client=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')`

Gets a list of all the subdirectories in a specific bucket location (called a prefix). The return can be with full paths (root to folder inclusive), or just the folder names.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	The bucket folder location. Defaults to "".	`''`
`s3_client`	`client`	The bucket client object. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`
`bucket_name`	`str`	The bucket name. Defaults to "noaa-wcsd-pds".	`'noaa-wcsd-pds'`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the subdirectory. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\cloud_utils.py

def get_subdirectories_in_s3_bucket_location(
    prefix: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
    bucket_name: str = "noaa-wcsd-pds",
) -> List[str]:
    """Gets a list of all the subdirectories in a specific bucket location
    (called a prefix). The return can be with full paths (root to folder
    inclusive), or just the folder names.

    Args:
        prefix (str, optional): The bucket folder location. Defaults to "".
        s3_client (boto3.client, optional): The bucket client object.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
        bucket_name (str, optional): The bucket name. Defaults to
            "noaa-wcsd-pds".

    Returns:
        List[str]: A list of strings, each being the subdirectory. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """
    if not s3_client:
        s3_client, _, _ = create_s3_objs(bucket_name)

    subdirs = set()
    result = s3_client.list_objects(
        Bucket=bucket_name, Prefix=prefix, Delimiter="/"
    )
    for o in result.get("CommonPrefixes"):
        subdir_full_path_from_prefix = o.get("Prefix")
        if return_full_paths:
            subdir = subdir_full_path_from_prefix
        else:
            subdir = subdir_full_path_from_prefix.replace(prefix, "")
            subdir = subdir.replace("/", "")
        subdirs.add(subdir)
    return list(subdirs)

`list_all_folders_in_gcp_bucket_location(location='', gcp_bucket=None, return_full_paths=True)`

Lists all of the folders in a GCP storage bucket location.

Parameters:

Name	Type	Description	Default
`location`	`str`	The blob location you would like to get the folders of. Defaults to "".	`''`
`gcp_bucket`	`bucket`	The gcp bucket to use. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not to return full paths. Defaults to True.	`True`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing the folder names or full paths.

Source code in src\aalibrary\utils\cloud_utils.py

def list_all_folders_in_gcp_bucket_location(
    location: str = "",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = True,
) -> List[str]:
    """Lists all of the folders in a GCP storage bucket location.

    Args:
        location (str, optional): The blob location you would like to get the
            folders of. Defaults to "".
        gcp_bucket (storage.Client.bucket, optional): The gcp bucket to use.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not to return full
            paths. Defaults to True.

    Returns:
        List[str]: A list of strings containing the folder names or full paths.
    """

    if location and not location.endswith("/"):
        location += "/"

    blobs_iterator = gcp_bucket.list_blobs(prefix=location, delimiter="/")

    folder_prefixes = []
    # We MUST iterate through all blobs, since this is a lazy-loading iterator.
    for _ in blobs_iterator:
        ...

    if blobs_iterator.prefixes:
        for p in blobs_iterator.prefixes:
            folder_prefixes.append(p)

    if return_full_paths:
        return folder_prefixes
    else:
        return [b.split("/")[-2] for b in folder_prefixes]

`list_all_objects_in_gcp_bucket_location(location='', gcp_bucket=None)`

Gets all of the files within a GCP storage bucket location.

Parameters:

Name	Type	Description	Default
`location`	`str`	The location to search for files. Defaults to "". Ex. "NCEI/Reuben_Lasker/RL2107"	`''`
`gcp_bucket`	`bucket`	The gcp bucket to use. Defaults to None.	`None`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing all URIs for each file in the bucket.

Source code in src\aalibrary\utils\cloud_utils.py

def list_all_objects_in_gcp_bucket_location(
    location: str = "", gcp_bucket: storage.Client.bucket = None
) -> List[str]:
    """Gets all of the files within a GCP storage bucket location.

    Args:
        location (str, optional): The location to search for files. Defaults
            to "".
            Ex. "NCEI/Reuben_Lasker/RL2107"
        gcp_bucket (storage.Client.bucket, optional): The gcp bucket to use.
            Defaults to None.

    Returns:
        List[str]: A list of strings containing all URIs for each file in the
            bucket.
    """

    all_blobs_in_this_location = []
    for blob in gcp_bucket.list_blobs(prefix=location):
        all_blobs_in_this_location.append(blob.name)
    return all_blobs_in_this_location

`list_all_objects_in_s3_bucket_location(prefix='', s3_resource=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')`

Lists all of the objects in a s3 bucket location denoted by prefix. Returns a list containing str. You get full paths if you specify the return_full_paths parameter.

Parameters:

Name	Type	Description	Default
`prefix`	`str`	The bucket location. Defaults to "".	`''`
`s3_resource`	`resource`	The bucket resource object. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`
`bucket_name`	`str`	The bucket name. Defaults to "noaa-wcsd-pds".	`'noaa-wcsd-pds'`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing either the objects name or path, dependent on the `return_full_paths` parameter.

Source code in src\aalibrary\utils\cloud_utils.py

def list_all_objects_in_s3_bucket_location(
    prefix: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
    bucket_name: str = "noaa-wcsd-pds",
) -> List[str]:
    """Lists all of the objects in a s3 bucket location denoted by `prefix`.
    Returns a list containing str. You get full paths if you specify the
    `return_full_paths` parameter.

    Args:
        prefix (str, optional): The bucket location. Defaults to "".
        s3_resource (boto3.resource, optional): The bucket resource object.
            Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
        bucket_name (str, optional): The bucket name. Defaults to
            "noaa-wcsd-pds".

    Returns:
        List[str]: A list of strings containing either the objects name or
            path, dependent on the `return_full_paths` parameter.
    """
    if not s3_resource:
        _, s3_resource, _ = create_s3_objs(bucket_name)

    object_keys = set()
    bucket = s3_resource.Bucket(bucket_name)
    for obj in bucket.objects.filter(Prefix=prefix):
        if return_full_paths:
            object_keys.add(obj.key)
        else:
            object_keys.add(obj.key.split("/")[-1])

    return list(object_keys)

`setup_gbq_client_objs(location='US', project_id='ggn-nmfs-aa-dev-1')`

Sets up Google Big Query client objects used to execute queries and such.

Parameters:

Name	Type	Description	Default
`location`	`str`	The location of the big-query tables/database. This is usually set when creating the database in big query. Defaults to "US".	`'US'`
`project_id`	`str`	The project id that the big query instance belongs to. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`

Returns:

Name	Type	Description
`Tuple`	`Tuple[Client, GCSFileSystem]`	The big query client object, along with an object for the Google Cloud Storage file system.

Source code in src\aalibrary\utils\cloud_utils.py

def setup_gbq_client_objs(
    location: str = "US", project_id: str = "ggn-nmfs-aa-dev-1"
) -> Tuple[bigquery.Client, gcsfs.GCSFileSystem]:
    """Sets up Google Big Query client objects used to execute queries and
    such.

    Args:
        location (str, optional): The location of the big-query
            tables/database. This is usually set when creating the database in
            big query. Defaults to "US".
        project_id (str, optional): The project id that the big query instance
            belongs to. Defaults to "ggn-nmfs-aa-dev-1".

    Returns:
        Tuple: The big query client object, along with an object for the Google
            Cloud Storage file system.
    """

    gcp_bq_client = bigquery.Client(location=location)

    gcp_gcs_file_system = gcsfs.GCSFileSystem(project=project_id)

    return gcp_bq_client, gcp_gcs_file_system

`setup_gcp_storage_objs(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data')`

Sets up Google Cloud Platform storage objects for use in accessing and modifying storage buckets.

Parameters:

Name	Type	Description	Default
`project_id`	`str`	The project id of the project you want to access. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`
`gcp_bucket_name`	`str`	The name of the exact bucket you want to access. Defaults to "ggn-nmfs-aa-dev-1-data".	`'ggn-nmfs-aa-dev-1-data'`

Returns:

Type	Description
`Tuple[Client, str, bucket]`	Tuple[storage.Client, str, storage.Client.bucket]: The storage client, followed by the GCP bucket name (str) and then the actual bucket object itself (which will be executing the commands used in this api).

Source code in src\aalibrary\utils\cloud_utils.py

def setup_gcp_storage_objs(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
) -> Tuple[storage.Client, str, storage.Client.bucket]:
    """Sets up Google Cloud Platform storage objects for use in accessing and
    modifying storage buckets.

    Args:
        project_id (str, optional): The project id of the project you want to
            access. Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The name of the exact bucket you want
            to access. Defaults to "ggn-nmfs-aa-dev-1-data".

    Returns:
        Tuple[storage.Client, str, storage.Client.bucket]: The storage client,
            followed by the GCP bucket name (str) and then the actual bucket
            object itself (which will be executing the commands used in this
            api).
    """

    gcp_stor_client = storage.Client(project=project_id)

    gcp_bucket = gcp_stor_client.bucket(gcp_bucket_name)

    return (gcp_stor_client, gcp_bucket_name, gcp_bucket)

`upload_file_to_gcp_bucket(bucket, blob_file_path, local_file_path, debug=False)`

Uploads a file to the blob storage bucket.

Parameters:

Name	Type	Description	Default
`bucket`	`bucket`	The bucket object used for uploading.	required
`blob_file_path`	`str`	The blob's file path. Ex. "data/itds/logs/execute_code_files/temp.csv" NOTE: This must include the file name as well as the extension.	required
`local_file_path`	`str`	The local file path you wish to upload to the blob.	required
`debug`	`bool`	Whether or not to print debug statements.	`False`

Source code in src\aalibrary\utils\cloud_utils.py

def upload_file_to_gcp_bucket(
    bucket: storage.Client.bucket,
    blob_file_path: str,
    local_file_path: str,
    debug: bool = False,
):
    """Uploads a file to the blob storage bucket.

    Args:
        bucket (storage.Client.bucket): The bucket object used for uploading.
        blob_file_path (str): The blob's file path.
            Ex. "data/itds/logs/execute_code_files/temp.csv"
            NOTE: This must include the file name as well as the extension.
        local_file_path (str): The local file path you wish to upload to the
            blob.
        debug (bool): Whether or not to print debug statements.
    """

    if not bucket:
        _, _, bucket = setup_gcp_storage_objs()

    blob = bucket.blob(blob_file_path, chunk_size=1024 * 1024 * 1)
    # Upload a new blob
    try:
        blob.upload_from_filename(local_file_path)
        if debug:
            print(f"New data uploaded to {blob.name}")
    except Exception:
        print(traceback.format_exc())
        raise

`discrepancies`

This file is used to identify discrepancies between what data exists on local versus what exists on the cloud. It considers the following things when comparing: * Number of files per cruise * File Name/Types * File Sizes * Checksum

Functions:

Name	Description
`compare_local_cruise_files_to_cloud`	Compares the locally stored cruise files (per echosounder) to what
`get_local_file_size`	Gets the size of a local file in bytes.
`get_local_sha256_checksum`	Calculates the SHA256 checksum of a file.

`compare_local_cruise_files_to_cloud(local_cruise_file_path='', ship_name='', survey_name='', echosounder='')`

Compares the locally stored cruise files (per echosounder) to what exists on the cloud by number of files, file sizes, and checksums. Reports any discrepancies in the console.

Parameters:

Name	Type	Description	Default
`local_cruise_file_path`	`str`	The folder path for the locally stored cruise data. Defaults to "".	`''`
`ship_name`	`str`	The ship name that the cruise falls under. Defaults to "".	`''`
`survey_name`	`str`	The survey/cruise name. Defaults to "".	`''`
`echosounder`	`str`	The specific echosounder you want to check. Defaults to "".	`''`

Source code in src\aalibrary\utils\discrepancies.py

def compare_local_cruise_files_to_cloud(
    local_cruise_file_path: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
):
    """Compares the locally stored cruise files (per echosounder) to what
    exists on the cloud by number of files, file sizes, and
    checksums. Reports any discrepancies in the console.

    Args:
        local_cruise_file_path (str, optional): The folder path for the locally
            stored cruise data. Defaults to "".
        ship_name (str, optional): The ship name that the cruise falls under.
            Defaults to "".
        survey_name (str, optional): The survey/cruise name. Defaults to "".
        echosounder (str, optional): The specific echosounder you want to
            check. Defaults to "".
    """

    # Create vars for use later
    _, s3_resource, _ = create_s3_objs()

    # Get all local files paths in cruise directory
    all_raw_file_paths = glob.glob(local_cruise_file_path + "/*.raw")
    all_idx_file_paths = glob.glob(local_cruise_file_path + "/*.idx")
    all_bot_file_paths = glob.glob(local_cruise_file_path + "/*.bot")
    # Check file numbers & types
    num_local_raw_files = len(all_raw_file_paths)
    num_local_idx_files = len(all_idx_file_paths)
    num_local_bot_files = len(all_bot_file_paths)
    num_local_files = (
        num_local_raw_files + num_local_idx_files + num_local_bot_files
    )
    # Get file names along with file paths
    # [(local_file_path, file_name_with_extension), (...)]
    all_raw_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_raw_file_paths
    ]
    all_idx_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_idx_file_paths
    ]
    all_bot_file_paths = [
        (file_path, file_path.split("/")[-1])
        for file_path in all_bot_file_paths
    ]

    # Compare number of files in cruise, local vs cloud
    num_files_in_s3 = get_all_file_names_in_a_surveys_echosounder_folder(
        ship_name=ship_name,
        survey_name=survey_name,
        echosounder=echosounder,
        s3_resource=s3_resource,
        return_full_paths=False,
    )
    if num_files_in_s3 == (num_local_files):
        print(
            "NUMBER OF FILES MATCH FOR"
            f" {ship_name}/{survey_name}/{echosounder}"
        )
    elif num_files_in_s3 != (num_local_files):
        print(
            "NUMBER OF FILES DO NOT MATCH FOR"
            f" {ship_name}/{survey_name}/{echosounder}"
        )
        print(
            f"NUMBER OF FILES IN S3: {num_files_in_s3} | NUMBER OF LOCAL "
            f"FILES: {num_local_files}"
        )

    # Go through each local file, and compare file existence, size, checksum
    for local_file_path, file_name in all_raw_file_paths:
        # Create s3 object key
        s3_object_key = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        # Get existence of file in s3
        file_exists_in_s3 = check_if_file_exists_in_s3(
            object_key=s3_object_key,
            s3_resource=s3_resource,
            s3_bucket_name="noaa-wcsd-pds",
        )
        # If file exists in s3, get size and checksum
        if file_exists_in_s3:
            # Get file size for s3 object key
            s3_file_size = get_file_size_from_s3(
                object_key=s3_object_key, s3_resource=s3_resource
            )
            # Get checksum for object key
            s3_checksum = get_checksum_sha256_from_s3(
                object_key=s3_object_key, s3_resource=s3_resource
            )

        # Get local file size
        local_file_size = get_local_file_size(local_file_path)
        # Get local file checksum
        local_file_checksum = get_local_sha256_checksum(local_file_path)

        # Compare existence
        if not file_exists_in_s3:
            print(
                f"LOCAL FILE {local_file_path} DOES NOT EXIST IN S3:"
                f" {s3_object_key}"
            )
        elif file_exists_in_s3:
            # Compare file sizes
            if local_file_size != s3_file_size:
                print(
                    f"FILE SIZE MISMATCH FOR {local_file_path} | LOCAL: "
                    f"{local_file_size} | S3: {s3_file_size}"
                )
            # Compare checksums
            if local_file_checksum != s3_checksum:
                print(
                    f"CHECKSUM MISMATCH FOR {local_file_path} | LOCAL: "
                    f"{local_file_checksum} | S3: {s3_checksum}"
                )

`get_local_file_size(local_file_path)`

Gets the size of a local file in bytes.

Parameters:

Name	Type	Description	Default
`local_file_path`	`str`	The local file path.	required

Returns:

Name	Type	Description
`int`	`int`	The size of the file in bytes.

Source code in src\aalibrary\utils\discrepancies.py

def get_local_file_size(local_file_path: str) -> int:
    """Gets the size of a local file in bytes.

    Args:
        local_file_path (str): The local file path.

    Returns:
        int: The size of the file in bytes.
    """
    return os.path.getsize(local_file_path)

`get_local_sha256_checksum(local_file_path, chunk_size=65536)`

Calculates the SHA256 checksum of a file.

Parameters:

Name	Type	Description	Default
`local_file_path`	`str`	The path to the file.	required
`chunk_size`	`int`	The size of chunks to read the file in (in bytes). Larger chunks can be more efficient for large files.	`65536`

Returns:

Name	Type	Description
`str`	`str`	The SHA256 checksum of the file as a hexadecimal string.

Source code in src\aalibrary\utils\discrepancies.py

def get_local_sha256_checksum(local_file_path, chunk_size=65536) -> str:
    """
    Calculates the SHA256 checksum of a file.

    Args:
        local_file_path (str): The path to the file.
        chunk_size (int): The size of chunks to read the file in (in bytes).
                          Larger chunks can be more efficient for large files.

    Returns:
        str: The SHA256 checksum of the file as a hexadecimal string.
    """

    sha256_hash = hashlib.sha256()
    try:
        with open(local_file_path, "rb") as f:
            # Read the file in chunks to handle large files efficiently
            for chunk in iter(lambda: f.read(chunk_size), b""):
                sha256_hash.update(chunk)
        return sha256_hash.hexdigest()
    except FileNotFoundError:
        return "File not found."
    except Exception as e:
        return f"An error occurred: {e}"

`frequency_data`

This module contains the FrequencyData class.

Classes:

Name	Description
`FrequencyData`	Given some dataset 'Sv', list all frequencies available. This class

Functions:

Name	Description
`main`	Opens a sample netCDF file and constructs a FrequencyData object to

`FrequencyData`

Given some dataset 'Sv', list all frequencies available. This class offers methods which help map out frequencies and channels plus additional utilities.

Methods:

Name	Description
`__init__`	Initializes class object and parses the frequencies available
`construct_frequency_list`	Parses the frequencies available in the xarray 'Sv'
`construct_frequency_map`	Either using a channel_list or a frequency_list this function
`construct_frequency_pair_combination_list`	Returns a list of tuple elements containing frequency combinations
`construct_frequency_set_combination_list`	Constructs a list of available frequency set permutations.
`powerset`	Generates combinations of elements of iterables ;
`print_frequency_list`	Prints each frequency element available in Sv.
`print_frequency_pair_combination_list`	Prints frequency combination list one element at a time.
`print_frequency_set_combination_list`	Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py

class FrequencyData:
    """Given some dataset 'Sv', list all frequencies available. This class
    offers methods which help map out frequencies and channels plus additional
    utilities."""

    def __init__(self, Sv):
        """Initializes class object and parses the frequencies available
        within the echodata object (xarray.Dataset) 'Sv'.

        Args:
            Sv (xarray.Dataset): The 'Sv' echodata object.
        """

        self.Sv = Sv  # Create a self object.
        self.frequency_list = []  # Declares a frequency list to be modified.

        self.construct_frequency_list()  # Construct the frequency list.
        # TODO : This string needs cleaning up ; remove unneeded commas and
        # empty tuples.
        # Constructs a list of available frequency set permutations.
        # Example : [('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
        # ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
        # ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'),
        # ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),
        # ('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'),
        # ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]
        self.frequency_set_combination_list = (
            self.construct_frequency_set_combination_list()
        )
        # print(self.frequency_set_combination_list)
        # Constructs a list of all possible unequal permutation pairs of
        # frequencies.
        # Example : [('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'),
        # ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
        # ('120 kHz', '200 kHz')]
        self.frequency_pair_combination_list = (
            self.construct_frequency_pair_combination_list()
        )
        # print(self.frequency_pair_combination_list)
        self.construct_frequency_map()

    def construct_frequency_list(self):
        """Parses the frequencies available in the xarray 'Sv'"""
        # Iterate through the natural index associated with Sv.Sv
        for i in range(len(self.Sv.Sv)):
            # Extract frequency.
            self.frequency_list.append(
                str(self.Sv.Sv[i].coords.get("channel"))
                .split(" kHz")[0]
                .split("GPT")[1]
                .strip()
                + " kHz"
            )
        # Log the constructed frequency list.
        logger.debug(f"Constructed frequency list: {self.frequency_list}")
        # Return string array frequency list of the form [18kHz, 70kHz, 200kHz]
        return self.frequency_list

    def powerset(self, iterable):
        """Generates combinations of elements of iterables ;
        powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

        Args:
            iterable (_type_): A list.

        Returns combinations of elements of iterables.
        """
        # Make a list from the iterable.
        s = list(iterable)
        # Returns a list of tuple elements containing combinations of elements
        # which derived from the iterable object.
        return chain.from_iterable(
            combinations(s, r) for r in range(len(s) + 1)
        )

    def construct_frequency_set_combination_list(self) -> List[Tuple]:
        """Constructs a list of available frequency set permutations.
        Example : [
            ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
            ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
            ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
            ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'),
            ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'),
            ('38 kHz', '120 kHz', '200 kHz'),
            ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]


        Returns:
            list<tuple>: A list of tuple elements containing frequency
                combinations which is useful for the KMeansOperator class.
        """
        # Returns a list of tuple elements containing frequency combinations
        # which is useful for the KMeansOperator class.
        return list(self.powerset(self.frequency_list))

    def print_frequency_set_combination_list(self):
        """Prints frequency combination list one element at a time."""

        for (
            i
        ) in (
            self.frequency_set_combination_list
        ):  # For each frequency combination associated with Sv.
            print(i)  # Print out frequency combination tuple.

    def construct_frequency_pair_combination_list(self) -> List[Tuple]:
        """Returns a list of tuple elements containing frequency combinations
        which is useful for the KMeansOperator class.

        Returns:
            list<tuple>: A list of tuple elements containing frequency
                combinations which is useful for the KMeansOperator class.
        """
        # Returns a list of tuple elements containing frequency combinations
        # which is useful for the KMeansOperator class.
        return list(itertools.combinations(self.frequency_list, 2))

    def print_frequency_pair_combination_list(self):
        """Prints frequency combination list one element at a time."""

        # For each frequency combination associated with Sv.
        for i in self.frequency_pair_combination_list:
            # Print out frequency combination tuple.
            print(i)

    def print_frequency_list(self):
        """Prints each frequency element available in Sv."""
        # For each frequency in the frequency_list associated with Sv.
        for i in self.frequency_list:
            # Print out the associated frequency.
            print(i)

    def construct_frequency_map(self, frequencies_provided=True):
        """Either using a channel_list or a frequency_list this function
        provides one which satisfies all requirements of this class structure.
        In particular the channels and frequencies involved have to be known
        and mapped to one another.

        Args:
            frequencies_provided (boolean): was a frequency_list provided at
                object creation? If so then 'True' if a channel_list instead
                was used then 'False'.
        """
        if frequencies_provided is True:
            self.simple_frequency_list = self.frequency_list
            # Declare a frequency map to be populated with string frequencies
            # of the form [[1,'38kHz'],[2,'120kHz'],[4,'200kHz']] where the
            # first element is meant to be the channel representing the
            # frequency. This is an internal object. Do not interfere.
            self.frequency_map = []
            # For each frequency 'j'.
            for j in self.simple_frequency_list:
                # Check each channel 'i'.
                for i in range(len(self.Sv.Sv)):
                    channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
                    # If the channel description contains "ES" then it is an
                    # ES channel.
                    if "ES" in channel_desc:
                        numeric_frequency_desc = (
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split("ES")[1]
                            .split("-")[0]
                            .strip()
                        )
                        if numeric_frequency_desc == j.split("kHz")[0].strip():
                            self.frequency_map.append(
                                [i, numeric_frequency_desc + " kHz"]
                            )
                    # If the channel description contains "GPT" then it is a
                    # GPT channel.
                    if "GPT" in channel_desc:
                        numeric_frequency_desc = (
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("GPT")[1]
                            .strip()
                        )
                        # To see if the channel associates with the
                        # frequency 'j' .
                        if numeric_frequency_desc == j.split("kHz")[0].strip():
                            # If so append it and the channel to the
                            # 'frequency_list'.
                            self.frequency_map.append(
                                [i, numeric_frequency_desc + " kHz"]
                            )
        else:

            channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
            # If the channel description contains "ES" then it is an ES
            # channel.
            if "ES" in channel_desc:
                for i in self.channel_list:
                    self.frequency_map.append(
                        [
                            i,
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("ES")[1]
                            .strip()
                            + " kHz",
                        ]
                    )
            # If the channel description contains "GPT" then it is a
            # GPT channel.
            if "GPT" in channel_desc:
                for i in self.channel_list:
                    self.frequency_map.append(
                        [
                            i,
                            str(self.Sv.Sv[i].coords.get("channel"))
                            .split(" kHz")[0]
                            .split("GPT")[1]
                            .strip()
                            + " kHz",
                        ]
                    )

        # Remove duplicates from frequency_list.
        self.frequency_map = [
            list(t) for t in set(tuple(item) for item in self.frequency_map)
        ]

`init(Sv)`

Initializes class object and parses the frequencies available within the echodata object (xarray.Dataset) 'Sv'.

Parameters:

Name	Type	Description	Default
`Sv`	`Dataset`	The 'Sv' echodata object.	required

Source code in src\aalibrary\utils\frequency_data.py

def __init__(self, Sv):
    """Initializes class object and parses the frequencies available
    within the echodata object (xarray.Dataset) 'Sv'.

    Args:
        Sv (xarray.Dataset): The 'Sv' echodata object.
    """

    self.Sv = Sv  # Create a self object.
    self.frequency_list = []  # Declares a frequency list to be modified.

    self.construct_frequency_list()  # Construct the frequency list.
    # TODO : This string needs cleaning up ; remove unneeded commas and
    # empty tuples.
    # Constructs a list of available frequency set permutations.
    # Example : [('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
    # ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
    # ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'),
    # ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),
    # ('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'),
    # ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]
    self.frequency_set_combination_list = (
        self.construct_frequency_set_combination_list()
    )
    # print(self.frequency_set_combination_list)
    # Constructs a list of all possible unequal permutation pairs of
    # frequencies.
    # Example : [('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'),
    # ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
    # ('120 kHz', '200 kHz')]
    self.frequency_pair_combination_list = (
        self.construct_frequency_pair_combination_list()
    )
    # print(self.frequency_pair_combination_list)
    self.construct_frequency_map()

`construct_frequency_list()`

Parses the frequencies available in the xarray 'Sv'

Source code in src\aalibrary\utils\frequency_data.py

def construct_frequency_list(self):
    """Parses the frequencies available in the xarray 'Sv'"""
    # Iterate through the natural index associated with Sv.Sv
    for i in range(len(self.Sv.Sv)):
        # Extract frequency.
        self.frequency_list.append(
            str(self.Sv.Sv[i].coords.get("channel"))
            .split(" kHz")[0]
            .split("GPT")[1]
            .strip()
            + " kHz"
        )
    # Log the constructed frequency list.
    logger.debug(f"Constructed frequency list: {self.frequency_list}")
    # Return string array frequency list of the form [18kHz, 70kHz, 200kHz]
    return self.frequency_list

`construct_frequency_map(frequencies_provided=True)`

Either using a channel_list or a frequency_list this function provides one which satisfies all requirements of this class structure. In particular the channels and frequencies involved have to be known and mapped to one another.

Parameters:

Name	Type	Description	Default
`frequencies_provided`	`boolean`	was a frequency_list provided at object creation? If so then 'True' if a channel_list instead was used then 'False'.	`True`

Source code in src\aalibrary\utils\frequency_data.py

def construct_frequency_map(self, frequencies_provided=True):
    """Either using a channel_list or a frequency_list this function
    provides one which satisfies all requirements of this class structure.
    In particular the channels and frequencies involved have to be known
    and mapped to one another.

    Args:
        frequencies_provided (boolean): was a frequency_list provided at
            object creation? If so then 'True' if a channel_list instead
            was used then 'False'.
    """
    if frequencies_provided is True:
        self.simple_frequency_list = self.frequency_list
        # Declare a frequency map to be populated with string frequencies
        # of the form [[1,'38kHz'],[2,'120kHz'],[4,'200kHz']] where the
        # first element is meant to be the channel representing the
        # frequency. This is an internal object. Do not interfere.
        self.frequency_map = []
        # For each frequency 'j'.
        for j in self.simple_frequency_list:
            # Check each channel 'i'.
            for i in range(len(self.Sv.Sv)):
                channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
                # If the channel description contains "ES" then it is an
                # ES channel.
                if "ES" in channel_desc:
                    numeric_frequency_desc = (
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split("ES")[1]
                        .split("-")[0]
                        .strip()
                    )
                    if numeric_frequency_desc == j.split("kHz")[0].strip():
                        self.frequency_map.append(
                            [i, numeric_frequency_desc + " kHz"]
                        )
                # If the channel description contains "GPT" then it is a
                # GPT channel.
                if "GPT" in channel_desc:
                    numeric_frequency_desc = (
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("GPT")[1]
                        .strip()
                    )
                    # To see if the channel associates with the
                    # frequency 'j' .
                    if numeric_frequency_desc == j.split("kHz")[0].strip():
                        # If so append it and the channel to the
                        # 'frequency_list'.
                        self.frequency_map.append(
                            [i, numeric_frequency_desc + " kHz"]
                        )
    else:

        channel_desc = str(self.Sv.Sv[i].coords.get("channel"))
        # If the channel description contains "ES" then it is an ES
        # channel.
        if "ES" in channel_desc:
            for i in self.channel_list:
                self.frequency_map.append(
                    [
                        i,
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("ES")[1]
                        .strip()
                        + " kHz",
                    ]
                )
        # If the channel description contains "GPT" then it is a
        # GPT channel.
        if "GPT" in channel_desc:
            for i in self.channel_list:
                self.frequency_map.append(
                    [
                        i,
                        str(self.Sv.Sv[i].coords.get("channel"))
                        .split(" kHz")[0]
                        .split("GPT")[1]
                        .strip()
                        + " kHz",
                    ]
                )

    # Remove duplicates from frequency_list.
    self.frequency_map = [
        list(t) for t in set(tuple(item) for item in self.frequency_map)
    ]

`construct_frequency_pair_combination_list()`

Returns a list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Returns:

Type	Description
`List[Tuple]`	list: A list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Source code in src\aalibrary\utils\frequency_data.py

def construct_frequency_pair_combination_list(self) -> List[Tuple]:
    """Returns a list of tuple elements containing frequency combinations
    which is useful for the KMeansOperator class.

    Returns:
        list<tuple>: A list of tuple elements containing frequency
            combinations which is useful for the KMeansOperator class.
    """
    # Returns a list of tuple elements containing frequency combinations
    # which is useful for the KMeansOperator class.
    return list(itertools.combinations(self.frequency_list, 2))

`construct_frequency_set_combination_list()`

Constructs a list of available frequency set permutations. Example : [ ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',), ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'), ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'), ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'), ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'), ('38 kHz', '120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]

Returns:

Type	Description
`List[Tuple]`	list: A list of tuple elements containing frequency combinations which is useful for the KMeansOperator class.

Source code in src\aalibrary\utils\frequency_data.py

def construct_frequency_set_combination_list(self) -> List[Tuple]:
    """Constructs a list of available frequency set permutations.
    Example : [
        ('18 kHz',), ('38 kHz',), ('120 kHz',), ('200 kHz',),
        ('18 kHz', '38 kHz'), ('18 kHz', '120 kHz'), ('18 kHz', '200 kHz'),
        ('38 kHz', '120 kHz'), ('38 kHz', '200 kHz'),
        ('120 kHz', '200 kHz'), ('18 kHz', '38 kHz', '120 kHz'),
        ('18 kHz', '38 kHz', '200 kHz'),('18 kHz', '120 kHz', '200 kHz'),
        ('38 kHz', '120 kHz', '200 kHz'),
        ('18 kHz', '38 kHz', '120 kHz', '200 kHz')]


    Returns:
        list<tuple>: A list of tuple elements containing frequency
            combinations which is useful for the KMeansOperator class.
    """
    # Returns a list of tuple elements containing frequency combinations
    # which is useful for the KMeansOperator class.
    return list(self.powerset(self.frequency_list))

`powerset(iterable)`

Generates combinations of elements of iterables ; powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

Parameters:

Name	Type	Description	Default
`iterable`	`_type_`	A list.	required

Returns combinations of elements of iterables.

Source code in src\aalibrary\utils\frequency_data.py

def powerset(self, iterable):
    """Generates combinations of elements of iterables ;
    powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)

    Args:
        iterable (_type_): A list.

    Returns combinations of elements of iterables.
    """
    # Make a list from the iterable.
    s = list(iterable)
    # Returns a list of tuple elements containing combinations of elements
    # which derived from the iterable object.
    return chain.from_iterable(
        combinations(s, r) for r in range(len(s) + 1)
    )

`print_frequency_list()`

Prints each frequency element available in Sv.

Source code in src\aalibrary\utils\frequency_data.py

def print_frequency_list(self):
    """Prints each frequency element available in Sv."""
    # For each frequency in the frequency_list associated with Sv.
    for i in self.frequency_list:
        # Print out the associated frequency.
        print(i)

`print_frequency_pair_combination_list()`

Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py

def print_frequency_pair_combination_list(self):
    """Prints frequency combination list one element at a time."""

    # For each frequency combination associated with Sv.
    for i in self.frequency_pair_combination_list:
        # Print out frequency combination tuple.
        print(i)

`print_frequency_set_combination_list()`

Prints frequency combination list one element at a time.

Source code in src\aalibrary\utils\frequency_data.py

def print_frequency_set_combination_list(self):
    """Prints frequency combination list one element at a time."""

    for (
        i
    ) in (
        self.frequency_set_combination_list
    ):  # For each frequency combination associated with Sv.
        print(i)  # Print out frequency combination tuple.

`main()`

Opens a sample netCDF file and constructs a FrequencyData object to extract frequency information from it.

Source code in src\aalibrary\utils\frequency_data.py

def main():
    """Opens a sample netCDF file and constructs a FrequencyData object to
    extract frequency information from it."""

    input_path = "/home/mryan/Desktop/HB1603_L1-D20160707-T190150.nc"
    ed = ep.open_converted(input_path)
    Sv = ep.calibrate.compute_Sv(ed)

    freq_data = FrequencyData(Sv)
    logger.debug(freq_data.frequency_map)

`gcp_utils`

This file contains code pertaining to auxiliary functions related to parsing through our google storage bucket.

Functions:

Name	Description
`get_all_echosounders_in_a_survey_in_storage_bucket`	Gets all of the echosounders in a survey in a GCP storage bucket.
`get_all_ship_names_in_gcp_bucket`	Gets all of the ship names within a GCP storage bucket.
`get_all_survey_names_from_a_ship_in_storage_bucket`	Gets all of the survey names from a particular ship in a GCP storage
`get_all_surveys_in_storage_bucket`	Gets all of the surveys in a GCP storage bucket.

`get_all_echosounders_in_a_survey_in_storage_bucket(ship_name='', survey_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

Gets all of the echosounders in a survey in a GCP storage bucket.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Will get normalized to GCP standards. Defaults to None.	`''`
`survey_name`	`str`	The survey name/identifier. Defaults to "".	`''`
`project_id`	`str`	The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`
`gcp_bucket_name`	`str`	The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".	`'ggn-nmfs-aa-dev-1-data'`
`gcp_bucket`	`bucket`	The GCP storage bucket client object. If none, one will be created for you based on the `project_id` and `gcp_bucket_name`. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing the echosounder names that exist in a survey.

Source code in src\aalibrary\utils\gcp_utils.py

def get_all_echosounders_in_a_survey_in_storage_bucket(
    ship_name: str = "",
    survey_name: str = "",
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the echosounders in a survey in a GCP storage bucket.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Will get normalized to GCP standards. Defaults to None.
        survey_name (str, optional): The survey name/identifier.
            Defaults to "".
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the echosounder names that
            exist in a survey.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    # Normalize the ship name.
    ship_name = normalize_ship_name(ship_name=ship_name)
    # Search all possible directories for ship surveys
    prefixes = [
        f"HDD/{ship_name}/{survey_name}/",
        f"NCEI/{ship_name}/{survey_name}/",
        f"OMAO/{ship_name}/{survey_name}/",
        f"TEST/{ship_name}/{survey_name}/",
    ]
    all_subfolder_names = set()
    all_echosounders = set()
    # Get all subfolders from this survey, whichever directory it resides in.
    for prefix in prefixes:
        subfolder_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_subfolder_names.update(subfolder_names)
    # Filter out any folder that is not an echosounder.
    for folder_name in list(all_subfolder_names):
        if (
            ("calibration" not in folder_name.lower())
            and ("metadata" not in folder_name.lower())
            and ("json" not in folder_name.lower())
            and ("doc" not in folder_name.lower())
        ):
            # Use 'add' since each 'folder_name' is a string.
            all_echosounders.add(folder_name)

    return list(all_echosounders)

`get_all_ship_names_in_gcp_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

Gets all of the ship names within a GCP storage bucket.

Parameters:

Name	Type	Description	Default
`project_id`	`str`	The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`
`gcp_bucket_name`	`str`	The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".	`'ggn-nmfs-aa-dev-1-data'`
`gcp_bucket`	`bucket`	The GCP storage bucket client object. If none, one will be created for you based on the `project_id` and `gcp_bucket_name`. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False. NOTE: You can set this parameter to `True` if you would like to see which folders contain which ships. For example: Reuben Lasker can have data coming from both OMAO and local upload HDD. It will look like: {'OMAO/Reuben_Lasker/', 'HDD/Reuben_Lasker/'}	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing the ship names.

Source code in src\aalibrary\utils\gcp_utils.py

def get_all_ship_names_in_gcp_bucket(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the ship names within a GCP storage bucket.

    Args:
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
            NOTE: You can set this parameter to `True` if you would like to see
            which folders contain which ships.
            For example: Reuben Lasker can have data coming from both OMAO and
            local upload HDD. It will look like:
            {'OMAO/Reuben_Lasker/', 'HDD/Reuben_Lasker/'}

    Returns:
        List[str]: A list of strings containing the ship names.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )
    # Get the initial subdirs
    prefixes = ["HDD/", "NCEI/", "OMAO/", "TEST/"]
    all_ship_names = set()
    for prefix in prefixes:
        ship_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_ship_names.update(ship_names)

    return list(all_ship_names)

`get_all_survey_names_from_a_ship_in_storage_bucket(ship_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

Gets all of the survey names from a particular ship in a GCP storage bucket.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Will get normalized to GCP standards. Defaults to None.	`''`
`project_id`	`str`	The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`
`gcp_bucket_name`	`str`	The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".	`'ggn-nmfs-aa-dev-1-data'`
`gcp_bucket`	`bucket`	The GCP storage bucket client object. If none, one will be created for you based on the `project_id` and `gcp_bucket_name`. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing the survey names.

Source code in src\aalibrary\utils\gcp_utils.py

def get_all_survey_names_from_a_ship_in_storage_bucket(
    ship_name: str = "",
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the survey names from a particular ship in a GCP storage
    bucket.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Will get normalized to GCP standards. Defaults to None.
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the survey names.
    """

    if gcp_bucket is None:
        _, _, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    # Normalize the ship name.
    ship_name = normalize_ship_name(ship_name=ship_name)
    # Search all possible directories for ship surveys
    prefixes = [
        f"HDD/{ship_name}/",
        f"NCEI/{ship_name}/",
        f"OMAO/{ship_name}/",
        f"TEST/{ship_name}/",
    ]
    all_survey_names = set()
    for prefix in prefixes:
        survey_names = list_all_folders_in_gcp_bucket_location(
            location=prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_survey_names.update(survey_names)

    return list(all_survey_names)

`get_all_surveys_in_storage_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

Gets all of the surveys in a GCP storage bucket.

Parameters:

Name	Type	Description	Default
`project_id`	`str`	The GCP project ID that the storage bucket resides in. Defaults to "ggn-nmfs-aa-dev-1".	`'ggn-nmfs-aa-dev-1'`
`gcp_bucket_name`	`str`	The GCP storage bucket name. Defaults to "ggn-nmfs-aa-dev-1-data".	`'ggn-nmfs-aa-dev-1-data'`
`gcp_bucket`	`bucket`	The GCP storage bucket client object. If none, one will be created for you based on the `project_id` and `gcp_bucket_name`. Defaults to None.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the survey names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings containing the survey names.

Source code in src\aalibrary\utils\gcp_utils.py

def get_all_surveys_in_storage_bucket(
    project_id: str = "ggn-nmfs-aa-dev-1",
    gcp_bucket_name: str = "ggn-nmfs-aa-dev-1-data",
    gcp_bucket: storage.Client.bucket = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the surveys in a GCP storage bucket.

    Args:
        project_id (str, optional): The GCP project ID that the storage bucket
            resides in.
            Defaults to "ggn-nmfs-aa-dev-1".
        gcp_bucket_name (str, optional): The GCP storage bucket name.
            Defaults to "ggn-nmfs-aa-dev-1-data".
        gcp_bucket (storage.Client.bucket, optional): The GCP storage bucket
            client object.
            If none, one will be created for you based on the `project_id` and
            `gcp_bucket_name`. Defaults to None.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the survey names listed. Defaults to False.

    Returns:
        List[str]: A list of strings containing the survey names.
    """

    if gcp_bucket is None:
        _, gcp_bucket_name, gcp_bucket = setup_gcp_storage_objs(
            project_id=project_id, gcp_bucket_name=gcp_bucket_name
        )

    all_ship_prefixes = get_all_ship_names_in_gcp_bucket(
        project_id=project_id,
        gcp_bucket_name=gcp_bucket_name,
        gcp_bucket=gcp_bucket,
        return_full_paths=True,
    )
    all_surveys = set()
    for ship_prefix in all_ship_prefixes:
        # Get surveys from each ship prefix
        ship_surveys = list_all_folders_in_gcp_bucket_location(
            location=ship_prefix,
            gcp_bucket=gcp_bucket,
            return_full_paths=return_full_paths,
        )
        all_surveys.update(ship_surveys)

    return list(all_surveys)

`helpers`

For helper functions.

Functions:

Name	Description
`check_for_assertion_errors`	Checks for errors in the kwargs provided.
`create_azure_config_file`	Creates an empty config file for azure storage keys.
`get_all_objects_in_survey_from_ncei`	Gets all of the object keys from a ship survey from the NCEI database.
`get_all_ship_objects_from_ncei`	Gets all of the object keys from a ship from the NCEI database.
`get_file_name_from_url`	Extracts the file name from a given storage bucket url. Includes the
`get_file_paths_via_json_link`	This function helps in getting the links from a json request, parsing
`get_netcdf_gcp_location_from_raw_gcp_location`	Gets the netcdf location of a raw file within GCP.
`normalize_ship_name`	Normalizes a ship's name. This is necessary for creating a deterministic
`parse_correct_gcp_storage_bucket_location`	Calculates the correct gcp storage location based on data source, file
`parse_variables_from_ncei_file_url`	Gets the file variables associated with a file url in NCEI.

`check_for_assertion_errors(**kwargs)`

Checks for errors in the kwargs provided.

Source code in src\aalibrary\utils\helpers.py

def check_for_assertion_errors(**kwargs):
    """Checks for errors in the kwargs provided."""

    if "file_name" in kwargs:
        assert kwargs["file_name"] != "", (
            "Please provide a valid file name with the file extension"
            " (ex. `2107RL_CW-D20210813-T220732.raw`)"
        )
    if "file_type" in kwargs:
        assert kwargs["file_type"] != "", "Please provide a valid file type."
        assert kwargs["file_type"] in config.VALID_FILETYPES, (
            "Please provide a valid file type (extension) "
            f"from the following: {config.VALID_FILETYPES}"
        )
    if "ship_name" in kwargs:
        assert kwargs["ship_name"] != "", (
            "Please provide a valid ship name "
            "(Title_Case_With_Underscores_As_Spaces)."
        )
    if "survey_name" in kwargs:
        assert (
            kwargs["survey_name"] != ""
        ), "Please provide a valid survey name."
    if "echosounder" in kwargs:
        assert (
            kwargs["echosounder"] != ""
        ), "Please provide a valid echosounder."
        assert kwargs["echosounder"] in config.VALID_ECHOSOUNDERS, (
            "Please provide a valid echosounder from the "
            f"following: {config.VALID_ECHOSOUNDERS}"
        )
    if "data_source" in kwargs:
        assert kwargs["data_source"] != "", (
            "Please provide a valid data source from the "
            f"following: {config.VALID_DATA_SOURCES}"
        )
        assert kwargs["data_source"] in config.VALID_DATA_SOURCES, (
            "Please provide a valid data source from the "
            f"following: {config.VALID_DATA_SOURCES}"
        )
    if "file_download_directory" in kwargs:
        assert (
            kwargs["file_download_directory"] != ""
        ), "Please provide a valid file download directory."
        assert os.path.isdir(kwargs["file_download_directory"]), (
            f"File download location `{kwargs['file_download_directory']}` is"
            " not found to be a valid dir, please reformat it."
        )
    if "gcp_bucket" in kwargs:
        assert kwargs["gcp_bucket"] is not None, (
            "Please provide a gcp_bucket object with"
            " `utils.cloud_utils.setup_gcp_storage()`"
        )
    if "directory" in kwargs:
        assert kwargs["directory"] != "", "Please provide a valid directory."
        assert os.path.isdir(kwargs["directory"]), (
            f"Directory location `{kwargs['directory']}` is not found to be a"
            " valid dir, please reformat it."
        )
    if "data_lake_directory_client" in kwargs:
        assert kwargs["data_lake_directory_client"] is not None, (
            f"The data lake directory client cannot be a"
            f" {type(kwargs['data_lake_directory_client'])} object. It needs "
            "to be of the type `DataLakeDirectoryClient`."
        )

`create_azure_config_file(download_directory='')`

Creates an empty config file for azure storage keys.

Parameters:

Name	Type	Description	Default
`download_directory`	`str`	The directory to store the azure config file. Defaults to "".	`''`

Source code in src\aalibrary\utils\helpers.py

def create_azure_config_file(download_directory: str = ""):
    """Creates an empty config file for azure storage keys.

    Args:
        download_directory (str, optional): The directory to store the
            azure config file. Defaults to "".
    """

    assert (
        download_directory != ""
    ), "Please provide a valid download directory."
    download_directory = os.path.normpath(download_directory)
    assert os.path.isdir(download_directory), (
        f"Directory location `{download_directory}` is not found to be a"
        " valid dir, please reformat it."
    )

    azure_config_file_path = os.path.join(
        download_directory, "azure_config.ini"
    )

    empty_config_str = """[DEFAULT]
azure_storage_account_name = 
azure_storage_account_key = 
azure_account_url = 
azure_connection_string = """

    with open(
        azure_config_file_path, "w", encoding="utf-8"
    ) as azure_config_file:
        azure_config_file.write(empty_config_str)

    print(
        f"Please fill out the azure config file at: {azure_config_file_path}"
    )
    return azure_config_file_path

`get_all_objects_in_survey_from_ncei(ship_name='', survey_name='', s3_bucket=None)`

Gets all of the object keys from a ship survey from the NCEI database.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The name of the ship. Must be title-case and have spaces substituted for underscores. Defaults to "".	`''`
`survey_name`	`str`	The name of the survey. Must match what we have in the NCEI database. Defaults to "".	`''`
`s3_bucket`	`resource`	The boto3 bucket resource for the bucket that the ship data resides in. Defaults to None.	`None`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings. Each one being an object key (path to the object inside of the bucket).

Source code in src\aalibrary\utils\helpers.py

def get_all_objects_in_survey_from_ncei(
    ship_name: str = "",
    survey_name: str = "",
    s3_bucket: boto3.resource = None,
) -> List[str]:
    """Gets all of the object keys from a ship survey from the NCEI database.

    Args:
        ship_name (str, optional): The name of the ship. Must be title-case
            and have spaces substituted for underscores. Defaults to "".
        survey_name (str, optional): The name of the survey. Must match what
            we have in the NCEI database. Defaults to "".
        s3_bucket (boto3.resource, optional): The boto3 bucket resource for
            the bucket that the ship data resides in. Defaults to None.

    Returns:
        List[str]: A list of strings. Each one being an object key (path to
            the object inside of the bucket).
    """

    assert ship_name != "", (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert " " not in ship_name, (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert survey_name != "", "Please provide a valid survey name."
    assert s3_bucket is not None, "Please pass in a boto3 bucket object."

    survey_objects = []

    for obj in s3_bucket.objects.filter(
        Prefix=f"data/raw/{ship_name}/{survey_name}"
    ):
        survey_objects.append(obj.key)

    return survey_objects

`get_all_ship_objects_from_ncei(ship_name='', bucket=None)`

Gets all of the object keys from a ship from the NCEI database.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The name of the ship. Must be title-case and have spaces substituted for underscores. Defaults to "".	`''`
`bucket`	`resource`	The boto3 bucket resource for the bucket that the ship data resides in. Defaults to None.	`None`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings. Each one being an object key (path to the object inside of the bucket).

Source code in src\aalibrary\utils\helpers.py

def get_all_ship_objects_from_ncei(
    ship_name: str = "", bucket: boto3.resource = None
) -> List[str]:
    """Gets all of the object keys from a ship from the NCEI database.

    Args:
        ship_name (str, optional): The name of the ship. Must be title-case
            and have spaces substituted for underscores. Defaults to "".
        bucket (boto3.resource, optional): The boto3 bucket resource for the
            bucket that the ship data resides in. Defaults to None.

    Returns:
        List[str]: A list of strings. Each one being an object key (path to
            the object inside of the bucket).
    """

    assert ship_name != "", (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert " " not in ship_name, (
        "Please provide a valid Titlecase",
        " ship_name using underscores as spaces.",
    )
    assert bucket is not None, "Please pass in a boto3 bucket object."

    ship_objects = []

    for obj in bucket.objects.filter(Prefix=f"data/raw/{ship_name}"):
        ship_objects.append(obj.key)

    return ship_objects

`get_file_name_from_url(url='')`

Extracts the file name from a given storage bucket url. Includes the file extension.

Parameters:

Name	Type	Description	Default
`url`	`str`	The full url of the storage object. Defaults to "". Example: "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_La sker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw"	`''`

Returns:

Name	Type	Description
`str`	`str`	The file name. Example: 2107RL_CW-D20210813-T220732.raw

Source code in src\aalibrary\utils\helpers.py

def get_file_name_from_url(url: str = "") -> str:
    """Extracts the file name from a given storage bucket url. Includes the
    file extension.

    Args:
        url (str, optional): The full url of the storage object.
            Defaults to "".
            Example: "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_La
                      sker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw"

    Returns:
        str: The file name. Example: 2107RL_CW-D20210813-T220732.raw
    """

    return url.split("/")[-1]

`get_file_paths_via_json_link(link='')`

This function helps in getting the links from a json request, parsing the contents of that url into a json object. The output is a json of the filename, and the cloud path link (s3 bucket link). Code from: https://www.ngdc.noaa.gov/mgg/wcd/S3_download.html

Parameters:

Name	Type	Description	Default
`link`	`str`	The link to the json url. Defaults to "".	`''`

Source code in src\aalibrary\utils\helpers.py

def get_file_paths_via_json_link(link: str = ""):
    """This function helps in getting the links from a json request, parsing
    the contents of that url into a json object. The output is a json of the
    filename, and the cloud path link (s3 bucket link).
    Code from: https://www.ngdc.noaa.gov/mgg/wcd/S3_download.html

    Args:
        link (str, optional): The link to the json url. Defaults to "".
    """

    url = requests.get(link, timeout=10)
    text = url.text
    contents = json.loads(text)
    for k in contents.keys():
        print(k)
    for i in contents["features"]:
        file_name = i["attributes"]["FILE_NAME"]
        cloud_path = i["attributes"]["CLOUD_PATH"]
        if cloud_path:
            print(f"{file_name}, {cloud_path}")

`get_netcdf_gcp_location_from_raw_gcp_location(gcp_storage_bucket_location='')`

Gets the netcdf location of a raw file within GCP.

Source code in src\aalibrary\utils\helpers.py

def get_netcdf_gcp_location_from_raw_gcp_location(
    gcp_storage_bucket_location: str = "",
):
    """Gets the netcdf location of a raw file within GCP."""

    gcp_storage_bucket_location = gcp_storage_bucket_location.replace(
        "/raw/", "/netcdf/"
    )
    # get rid of file extension and replace with netcdf
    netcdf_gcp_storage_bucket_location = (
        ".".join(gcp_storage_bucket_location.split(".")[:-1]) + ".nc"
    )

    return netcdf_gcp_storage_bucket_location

`normalize_ship_name(ship_name='')`

Normalizes a ship's name. This is necessary for creating a deterministic file structure within our GCP storage bucket. The ship name is returned as a Title_Cased_And_Snake_Cased ship name, with no punctuation. Ex. HENRY B. BIGELOW will return Henry_B_Bigelow

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship name string. Defaults to "".	`''`

Returns:

Name	Type	Description
`str`	`str`	The formatted and normalized version of the ship name.

Source code in src\aalibrary\utils\helpers.py

def normalize_ship_name(ship_name: str = "") -> str:
    """Normalizes a ship's name. This is necessary for creating a deterministic
    file structure within our GCP storage bucket.
    The ship name is returned as a Title_Cased_And_Snake_Cased ship name, with
    no punctuation.
    Ex. `HENRY B. BIGELOW` will return `Henry_B_Bigelow`

    Args:
        ship_name (str, optional): The ship name string. Defaults to "".

    Returns:
        str: The formatted and normalized version of the ship name.
    """

    # Lower case the string
    ship_name = ship_name.lower()
    # Un-normalize (replace `_` with ` ` to help further processing)
    # In the edge-case that users include an underscore.
    ship_name = ship_name.replace("_", " ")
    # Remove all punctuation.
    ship_name = "".join(
        [char for char in ship_name if char not in string.punctuation]
    )
    # Title-case it
    ship_name = ship_name.title()
    # Snake-case it
    ship_name = ship_name.replace(" ", "_")

    return ship_name

`parse_correct_gcp_storage_bucket_location(file_name='', file_type='', ship_name='', survey_name='', echosounder='', data_source='', is_metadata=False, is_survey_metadata=False, debug=False)`

Calculates the correct gcp storage location based on data source, file type, and if the file is metadata or not.

Parameters:

Name	Type	Description	Default
`file_name`	`str`	The file name (includes extension). Defaults to "".	`''`
`file_type`	`str`	The file type (not include the dot "."). Defaults to "".	`''`
`ship_name`	`str`	The ship name associated with this survey. Defaults to "".	`''`
`survey_name`	`str`	The survey name/identifier. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used to gather the data. Defaults to "".	`''`
`data_source`	`str`	The source of the data. Can be one of ["NCEI", "OMAO"]. Defaults to "".	`''`
`is_metadata`	`bool`	Whether or not the file is a metadata file. Necessary since files that are considered metadata (metadata json, or readmes) are stored in a separate directory. Defaults to False.	`False`
`is_survey_metadata`	`bool`	Whether or not the file is a metadata file associated with a survey. The files are stored at the survey level, in the `metadata/` folder. Defaults to False.	`False`
`debug`	`bool`	Whether or not to print debug statements. Defaults to False.	`False`

Returns:

Name	Type	Description
`str`	`str`	The correctly parsed GCP storage bucket location.

Source code in src\aalibrary\utils\helpers.py

def parse_correct_gcp_storage_bucket_location(
    file_name: str = "",
    file_type: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    data_source: str = "",
    is_metadata: bool = False,
    is_survey_metadata: bool = False,
    debug: bool = False,
) -> str:
    """Calculates the correct gcp storage location based on data source, file
    type, and if the file is metadata or not.

    Args:
        file_name (str, optional): The file name (includes extension).
            Defaults to "".
        file_type (str, optional): The file type (not include the dot ".").
            Defaults to "".
        ship_name (str, optional): The ship name associated with this survey.
            Defaults to "".
        survey_name (str, optional): The survey name/identifier. Defaults
            to "".
        echosounder (str, optional): The echosounder used to gather the data.
            Defaults to "".
        data_source (str, optional): The source of the data. Can be one of
            ["NCEI", "OMAO"]. Defaults to "".
        is_metadata (bool, optional): Whether or not the file is a metadata
            file. Necessary since files that are considered metadata (metadata
            json, or readmes) are stored in a separate directory. Defaults to
            False.
        is_survey_metadata (bool, optional): Whether or not the file is a
            metadata file associated with a survey. The files are stored at
            the survey level, in the `metadata/` folder. Defaults to False.
        debug (bool, optional): Whether or not to print debug statements.
            Defaults to False.

    Returns:
        str: The correctly parsed GCP storage bucket location.
    """

    assert (
        (is_metadata and is_survey_metadata is False)
        or (is_metadata is False and is_survey_metadata)
        or (is_metadata is False and is_survey_metadata is False)
    ), (
        "Please make sure that only one of `is_metadata` and"
        " `is_survey_metadata` is True. Or you can set both to False."
    )

    # Creating the correct upload location
    if is_survey_metadata:
        gcp_storage_bucket_location = (
            f"{data_source}/{ship_name}/{survey_name}/metadata/{file_name}"
        )
    elif is_metadata:
        gcp_storage_bucket_location = (
            f"{data_source}/{ship_name}/{survey_name}/{echosounder}/metadata/"
        )
        # Figure out if its a raw or idx file (belongs in raw folder)
        if file_type.lower() in config.RAW_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                gcp_storage_bucket_location + f"raw/{file_name}.json"
            )
        elif file_type.lower() in config.CONVERTED_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                gcp_storage_bucket_location + f"netcdf/{file_name}.json"
            )
    else:
        # Figure out if its a raw or idx file (belongs in raw folder)
        if file_type.lower() in config.RAW_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                f"{data_source}/{ship_name}/"
                f"{survey_name}/{echosounder}/data/raw/{file_name}"
            )
        elif file_type.lower() in config.CONVERTED_DATA_FILE_TYPES:
            gcp_storage_bucket_location = (
                f"{data_source}/{ship_name}/"
                f"{survey_name}/{echosounder}/data/netcdf/{file_name}"
            )

    if debug:
        logging.debug(
            "PARSED GCP_STORAGE_BUCKET_LOCATION: %s",
            gcp_storage_bucket_location,
        )

    return gcp_storage_bucket_location

`parse_variables_from_ncei_file_url(url='')`

Gets the file variables associated with a file url in NCEI. File urls in NCEI follow this template: data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}

NOTE: file_name will include the extension.

Source code in src\aalibrary\utils\helpers.py

def parse_variables_from_ncei_file_url(url: str = ""):
    """Gets the file variables associated with a file url in NCEI.
    File urls in NCEI follow this template:
    data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}

    NOTE: file_name will include the extension."""

    file_name = get_file_name_from_url(url=url)
    file_type = file_name.split(".")[-1]
    echosounder = url.split("/")[-2]
    survey_name = url.split("/")[-3]
    ship_name = url.split("/")[-4]

    return file_name, file_type, echosounder, survey_name, ship_name

`ices`

Functions:

Name	Description
`correct_dimensions_ices`	Extracts angle data from echopype DataArray.
`echopype_ek60_raw_to_ices_netcdf`	Writes echodata Beam_group ds to a Beam_groupX netcdf file.
`echopype_ek80_raw_to_ices_netcdf`	Writes echodata Beam_group ds to a Beam_groupX netcdf file.
`ragged_data_type_ices`	Transforms a gridded 4 dimensional variable from an Echodata object
`write_ek60_beamgroup_to_netcdf`	Writes echopype Beam_group ds to a Beam_groupX netcdf file.
`write_ek80_beamgroup_to_netcdf`	Writes echodata Beam_group ds to a Beam_groupX netcdf file.

`correct_dimensions_ices(echodata, variable_name='')`

Extracts angle data from echopype DataArray.

Args: echodata (echopype.DataArray): Echopype echodata object containing data. variable_name (str): The name of the variable that needs to be transformed to a ragged array representation.

Returns: np.array that returns array with correct dimension as specified by ICES netcdf convention.

Source code in src\aalibrary\utils\ices.py

def correct_dimensions_ices(echodata, variable_name: str = "") -> np.ndarray:
    """Extracts angle data from echopype DataArray.

    Args:
    echodata (echopype.DataArray): Echopype echodata object containing data.
    variable_name (str): The name of the variable that needs to be transformed to
    a ragged array representation.

    Returns:
    np.array that returns array with correct dimension as specified by ICES netcdf convention.
    """
    num_pings = echodata["Sonar/Beam_group1"].sizes["ping_time"]
    num_channels = echodata["Sonar/Beam_group1"].sizes["channel"]

    compliant_np = np.empty((num_pings, num_channels))

    for ping_time_val in range(num_pings):
        compliant_np[ping_time_val, :] = (
            echodata["Sonar/Beam_group1"][variable_name]
            .values.transpose()
            .astype(np.float32)
        )

    return compliant_np

`echopype_ek60_raw_to_ices_netcdf(echodata, export_file)`

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py

def echopype_ek60_raw_to_ices_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """

    engine = "netcdf4"

    output_file = validate_output_path(
        source_file=echodata.source_file,
        engine=engine,
        save_path=export_file,
        output_storage_options={},
    )

    save_file(
        echodata["Top-level"],
        path=output_file,
        mode="w",
        engine=engine,
        compression_settings=COMPRESSION_SETTINGS[engine],
    )
    save_file(
        echodata["Environment"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Environment",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )
    save_file(
        echodata["Platform"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    save_file(
        echodata["Platform/NMEA"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform/NMEA",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    save_file(
        echodata["Sonar"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Sonar",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

    echopype_ek60_raw_to_ices_netcdf(echodata, output_file)

    save_file(
        echodata["Vendor_specific"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Vendor_specific",
        compression_settings=COMPRESSION_SETTINGS[engine],
    )

`echopype_ek80_raw_to_ices_netcdf(echodata, export_file)`

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py

def echopype_ek80_raw_to_ices_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """
    engine = "netcdf4"

    output_file = validate_output_path(
        source_file=echodata.source_file,
        engine=engine,
        save_path=export_file,
        output_storage_options={},
    )

    save_file(
        echodata["Top-level"],
        path=output_file,
        mode="w",
        engine=engine,
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Environment"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Environment",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Platform"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Platform/NMEA"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Platform/NMEA",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    save_file(
        echodata["Sonar"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Sonar",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )
    write_ek80_beamgroup_to_netcdf(echodata, output_file)
    save_file(
        echodata["Vendor_specific"],
        path=output_file,
        mode="a",
        engine=engine,
        group="Vendor_specific",
        compression_settings=COMPRESSION_SETTINGS[engine]
    )

`ragged_data_type_ices(echodata, variable_name='')`

Transforms a gridded 4 dimensional variable from an Echodata object into a ragged array representation.

Args: echodata (echopype.Echodata): Echopype echodata object containing a variable in the Beam_group1. variable_name (str): The name of the variable that needs to be transformed to a ragged array representation.

Returns: ICES complain np array of type object.

Source code in src\aalibrary\utils\ices.py

def ragged_data_type_ices(echodata, variable_name: str = "") -> np.ndarray:
    """Transforms a gridded 4 dimensional variable from an Echodata object
    into a ragged array representation.

    Args:
    echodata (echopype.Echodata): Echopype echodata object containing a variable in the Beam_group1.
    variable_name (str): The name of the variable that needs to be transformed to
    a ragged array representation.

    Returns:
    ICES complain np array of type object.
    """

    num_pings = echodata["Sonar/Beam_group1"].sizes["ping_time"]
    num_channels = echodata["Sonar/Beam_group1"].sizes["channel"]
    num_beam = echodata["Sonar/Beam_group1"].sizes["beam"]

    compliant_np = np.empty((num_pings, num_channels, num_beam), object)

    for c, channel in enumerate(
        echodata["Sonar/Beam_group1"][variable_name].coords["channel"].values
    ):

        test = echodata["Sonar/Beam_group1"][variable_name].sel(channel=channel)

        # Find the first index along 'range_sample' where all values are NaN across 'beam'
        is_nan_across_beam = test.isnull().all(dim="beam")

        # Find the first index along 'range_sample' where 'is_nan_across_beam' is True
        first_nan_range_sample_indices = xr.apply_ufunc(
            np.argmax,
            is_nan_across_beam,
            input_core_dims=[["range_sample"]],
            exclude_dims=set(("range_sample",)),
            vectorize=True,  # Apply the function row-wise for each ping_time
            dask="parallelized",
            output_dtypes=[int],
        )

        found_nan_block_mask = is_nan_across_beam.isel(
            range_sample=first_nan_range_sample_indices.clip(min=0)
        )

        sample_t = []

        # Iterate through ping_time to populate sample_t
        for i, _ in enumerate(test["ping_time"].values):
            if found_nan_block_mask.isel(ping_time=i):
                value_to_append = (
                    test["range_sample"].values[
                        first_nan_range_sample_indices.isel(ping_time=i).item()
                    ]
                    - 1
                )
                sample_t.append(value_to_append)
            else:
                # If no all-NaN block was found, append the last range_sample index
                sample_t.append(test["range_sample"].values[-1])
        sample_t = np.array(sample_t)

        all_ping_segments = []

        for i, ping_da in enumerate(test):
            segment = ping_da.isel(range_sample=slice(sample_t[i])).values.transpose()
            all_ping_segments.append(segment)

        for i in range(len(compliant_np)):
            for j in range(4):
                compliant_np[i, c, j] = all_ping_segments[i][j].astype(np.float32)

    return compliant_np

`write_ek60_beamgroup_to_netcdf(echodata, export_file)`

Writes echopype Beam_group ds to a Beam_groupX netcdf file.

Parameters: ed (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the output NetCDF file.

Source code in src\aalibrary\utils\ices.py

def write_ek60_beamgroup_to_netcdf(echodata, export_file):
    """
    Writes echopype Beam_group ds to a Beam_groupX netcdf file.

    Parameters:
    ed (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the output NetCDF file.
    """
    ragged_backscatter_r_data = ragged_data_type_ices(echodata, "backscatter_r")
    beamwidth_receive_major_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_athwartship"
    )
    beamwidth_receive_minor_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_alongship"
    )
    echoangle_major_data = ragged_data_type_ices(echodata, "angle_athwartship")
    echoangle_minor_data = ragged_data_type_ices(echodata, "angle_alongship")
    equivalent_beam_angle_data = correct_dimensions_ices(
        echodata, "equivalent_beam_angle"
    )
    rx_beam_rotation_phi_data = (
        ragged_data_type_ices(echodata, "angle_athwartship") * -1
    )
    rx_beam_rotation_psi_data = np.zeros(
        (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
    )
    rx_beam_rotation_theta_data = ragged_data_type_ices(echodata, "angle_alongship")

    for i in range(echodata["Sonar/Beam_group1"].sizes["channel"]):

        with netCDF4.Dataset(export_file, "a", format="netcdf4") as ncfile:
            grp = ncfile.createGroup(f"Sonar/Beam_group{i+1}")
            grp.setncattr("beam_mode", echodata["Sonar/Beam_group1"].attrs["beam_mode"])
            grp.setncattr(
                "conversion_equation_type",
                echodata["Sonar/Beam_group1"].attrs["conversion_equation_t"],
            )
            grp.setncattr(
                "long_name", echodata["Sonar/Beam_group1"].coords["channel"].values[i]
            )

            # Create the VLEN type for 32-bit floats
            sample_t = grp.createVLType(np.float32, "sample_t")
            angle_t = grp.createVLType(np.float32, "angle_t")

            # Create ping_time dimension and ping_time coordinate variable
            grp.createDimension("ping_time", None)

            ping_time_var = grp.createVariable("ping_time", np.int64, ("ping_time",))
            ping_time_var.units = "nanoseconds since 1970-01-01 00:00:00Z"
            ping_time_var.standard_name = "time"
            ping_time_var.long_name = "Time-stamp of each ping"
            ping_time_var.axis = "T"
            ping_time_var.calendar = "gregorian"
            ping_time_var[:] = echodata["Sonar/Beam_group1"].coords[
                "ping_time"
            ].values - np.datetime64("1970-01-01T00:00:00Z")

            # Create beam dimension and coordinate variable
            grp.createDimension("beam", 1)

            beam_var = grp.createVariable("beam", "S1", ("beam",))
            beam_var.long_name = "Beam name"
            beam_var[:] = echodata["Sonar/Beam_group1"].coords["channel"].values[i]

            # Create backscatter_r variable
            backscatter_r = grp.createVariable(
                "backscatter_r", sample_t, ("ping_time", "beam")
            )
            backscatter_r[:] = ragged_backscatter_r_data[:, i]
            backscatter_r.setncattr(
                "long_name", "Raw backscatter measurements (real part)"
            )
            backscatter_r.units = "dB"

            # Create beam_stabilisation variable
            beam_stablisation = grp.createVariable(
                "beam_stablisation", int, ("ping_time", "beam")
            )
            beam_stablisation[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            beam_stablisation.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )

            # Create beam_type variable
            beam_type = grp.createVariable("beam_type", int, ())
            beam_type[:] = echodata["Sonar/Beam_group1"]["beam_type"].values[i]
            beam_type.setncattr("long_name", "type of transducer (0-single, 1-split)")

            # Create beamwidth_receive_major variable
            beamwidth_receive_major = grp.createVariable(
                "beamwidth_receive_major", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_major[:] = beamwidth_receive_major_data[:, i]
            beamwidth_receive_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_receive_major.units = "arc_degree"
            beamwidth_receive_major.valid_range = [0.0, 360.0]

            # Create beamwidth_receive_minor variable
            beamwidth_receive_minor = grp.createVariable(
                "beamwidth_receive_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_receive_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_receive_minor.units = "arc_degree"
            beamwidth_receive_minor.valid_range = [0.0, 360.0]

            beamwidth_transmit_major = grp.createVariable(
                "beamwidth_transmit_major", np.float32, ("ping_time", "beam")
            )
            # Create beamwidth_transmit_major variable
            beamwidth_transmit_major[:] = beamwidth_receive_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_transmit_major.units = "arc_degree"
            beamwidth_transmit_major.valid_range = [0.0, 360.0]

            # Create beamwidth_transmit_minor variable
            beamwidth_transmit_minor = grp.createVariable(
                "beamwidth_transmit_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_transmit_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_transmit_minor.units = "arc_degree"
            beamwidth_transmit_minor.valid_range = [0.0, 360.0]

            # Create blanking_interval variable
            blanking_interval = grp.createVariable(
                "blanking_interval", float, ("ping_time", "beam")
            )
            blanking_interval[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            blanking_interval.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )
            blanking_interval.units = "s"
            blanking_interval.valid_min = 0.0

            # Create calibrated_frequency variable
            calibrated_frequency = grp.createVariable(
                "calibrated_frequency", np.float64, ()
            )
            calibrated_frequency[:] = echodata["Sonar/Beam_group1"][
                "frequency_nominal"
            ].values[i]
            calibrated_frequency.setncattr("long_name", "Calibration gain frequencies")
            calibrated_frequency.units = "Hz"
            calibrated_frequency.valid_min = 0.0

            # Create echoangle_major variable (talk to joe about this)
            echoangle_major = grp.createVariable(
                "echoangle_major", angle_t, ("ping_time", "beam")
            )
            echoangle_major[:] = echoangle_major_data[:, i]
            echoangle_major.setncattr(
                "long_name", "Echo arrival angle in the major beam coordinate"
            )
            echoangle_major.units = "arc_degree"
            echoangle_major.valid_range = [-180.0, 180.0]

            # Create echoangle_minor variable
            echoangle_minor = grp.createVariable(
                "echoangle_minor", angle_t, ("ping_time", "beam")
            )
            echoangle_minor[:] = echoangle_minor_data[:, i]
            echoangle_minor.setncattr(
                "long_name", "Echo arrival angle in the minor beam coordinate"
            )
            echoangle_minor.units = "arc_degree"
            echoangle_minor.valid_range = [-180.0, 180.0]

            # Create echoangle_major sensitivity variable
            echoangle_major_sensitivity = grp.createVariable(
                "echoangle_major_sensitivityr", np.float64, ()
            )
            echoangle_major_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_athwartship"
            ].values[i]
            echoangle_major_sensitivity.setncattr(
                "long_name", "Major angle scaling factor"
            )
            echoangle_major_sensitivity.units = "1"
            echoangle_major_sensitivity.valid_min = 0.0

            # Create echoangle_minor sensitivity variable
            echoangle_minor_sensitivity = grp.createVariable(
                "echoangle_minor_sensitivity", np.float64, ()
            )
            echoangle_minor_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_alongship"
            ].values[i]
            echoangle_minor_sensitivity.setncattr(
                "long_name", "Minor angle scaling factor"
            )
            echoangle_minor_sensitivity.units = "1"
            echoangle_minor_sensitivity.valid_min = 0.0

            # Create equivalent_beam_angle variable (weird angle values)
            equivalent_beam_angle = grp.createVariable(
                "equivalent_beam_angle", np.float64, ("ping_time", "beam")
            )
            equivalent_beam_angle[:] = equivalent_beam_angle_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            equivalent_beam_angle.setncattr("long_name", "Equivalent beam angle")

            # Create frequency variable
            frequency = grp.createVariable("frequency", np.float64, ())
            frequency[:] = echodata["Sonar/Beam_group1"]["frequency_nominal"].values[i]
            frequency.setncattr("long_name", "Calibration gain frequencies")
            frequency.units = "Hz"
            frequency.valid_min = 0.0

            # Create non_quantitative_processing variable
            non_quantitative_processing = grp.createVariable(
                "non_quantitative_processing", int, ("ping_time")
            )
            non_quantitative_processing[:] = np.zeros(
                echodata["Sonar/Beam_group1"].sizes["ping_time"]
            )
            non_quantitative_processing.setncattr(
                "long_name",
                "Presence or not of non-quantitative processing applied to the backscattering data (sonar specific)",
            )

            # Create platoform_latitude variable
            platoform_latitude = grp.createVariable(
                "platoform_latitude", np.float64, ("ping_time")
            )
            platoform_latitude[:] = echodata["Platform"]["latitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platoform_latitude.setncattr(
                "long_name", "Heading of the platform at time of the ping"
            )
            platoform_latitude.units = "degrees_north"
            platoform_latitude.valid_range = [-180.0, 180.0]

            # Create platoform_longitude variable
            platoform_longitude = grp.createVariable(
                "platoform_longitude", np.float64, ("ping_time")
            )
            platoform_longitude[:] = echodata["Platform"]["longitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platoform_longitude.setncattr("long_name", "longitude")
            platoform_longitude.units = "degrees_east"
            platoform_longitude.valid_range = [-180.0, 180.0]

            # Create platoform_pitch variable
            platform_pitch = grp.createVariable(
                "platform_pitch", np.float64, ("ping_time")
            )
            platform_pitch[:] = echodata["Platform"]["pitch"].values
            platform_pitch.setncattr("long_name", "pitch_angle")
            platform_pitch.units = "arc_degree"
            platform_pitch.valid_range = [-90.0, 90.0]

            # Create platoform_roll variable
            platoform_roll = grp.createVariable(
                "platform_roll", np.float64, ("ping_time")
            )
            platoform_roll[:] = echodata["Platform"]["roll"].values
            platoform_roll.setncattr("long_name", "roll angle")
            platoform_roll.units = "arc_degree"

            # Create platoform_vertical_offset variable
            platoform_vertical_offset = grp.createVariable(
                "platoform_vertical_offset", np.float64, ("ping_time")
            )
            platoform_vertical_offset[:] = echodata["Platform"][
                "vertical_offset"
            ].values
            platoform_vertical_offset.setncattr(
                "long_name",
                "Platform vertical distance from reference point to the water line",
            )
            platoform_vertical_offset.units = "m"

            # Create rx_beam_rotation_phi variable
            rx_beam_rotation_phi = grp.createVariable(
                "rx_beam_rotation_phi", angle_t, ("ping_time", "beam")
            )
            rx_beam_rotation_phi[:] = rx_beam_rotation_phi_data[:, i]
            rx_beam_rotation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            rx_beam_rotation_phi.units = "arc_degree"
            rx_beam_rotation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            rx_beam_rotation_psi = grp.createVariable(
                "rx_beam_rotation_psi", np.float64, ("ping_time", "beam")
            )
            rx_beam_rotation_psi[:] = rx_beam_rotation_psi_data
            rx_beam_rotation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            rx_beam_rotation_psi.units = "arc_degree"
            rx_beam_rotation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            rx_beam_rotation_theta = grp.createVariable(
                "rx_beam_roation_theta", angle_t, ("ping_time", "beam")
            )
            rx_beam_rotation_theta[:] = rx_beam_rotation_theta_data[:, i]
            rx_beam_rotation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            rx_beam_rotation_theta.units = "arc_degree"
            rx_beam_rotation_theta.valid_range = [-90.0, 90.0]

            # Create sample_interval variable
            sample_interval = grp.createVariable(
                "sample_interval", np.float64, ("ping_time", "beam")
            )
            sample_interval[:] = (
                echodata["Sonar/Beam_group1"]["sample_interval"]
                .transpose()
                .values[:, i]
            )
            sample_interval.setncattr("long_name", "Equivalent beam angle")
            sample_interval.units = "s"
            sample_interval.valid_min = 0.0
            sample_interval.coordinates = (
                "ping_time platform_latitude platform_longitude"
            )

            # Create sample_time_offset variable
            sample_time_offset = grp.createVariable(
                "sample_time_offset", np.float64, ("ping_time", "beam")
            )
            sample_time_offset[:] = (
                echodata["Sonar/Beam_group1"]["sample_time_offset"]
                .transpose()
                .values[:, i]
            )
            sample_time_offset.setncattr(
                "long_name",
                "Time offset that is subtracted from the timestamp of each sample",
            )
            sample_time_offset.units = "s"

            # Create transmit_duration_nominal variable
            transmit_duration_nominal = grp.createVariable(
                "transmit_duration_nominal", np.float64, ("ping_time", "beam")
            )
            transmit_duration_nominal[:] = (
                echodata["Sonar/Beam_group1"]["transmit_duration_nominal"]
                .transpose()
                .values[:, i]
            )
            transmit_duration_nominal.setncattr(
                "long_name", "Nominal duration of transmitted pulse"
            )
            transmit_duration_nominal.units = "Hz"
            transmit_duration_nominal.valid_min = 0.0

            # Create transmit_frequency_start variable
            transmit_frequency_start = grp.createVariable(
                "transmit_frequency_start", np.float64, ("ping_time")
            )
            transmit_frequency_start[:] = echodata["Sonar/Beam_group1"][
                "transmit_frequency_start"
            ].values[i]
            transmit_frequency_start.setncattr(
                "long_name", "Start frequency in transmitted pulse"
            )
            transmit_frequency_start.units = "Hz"
            transmit_frequency_start.valid_min = 0.0

            # Create transmit_frequency_stop variable
            transmit_frequency_stop = grp.createVariable(
                "transmit_frequency_stop", np.float64, ("ping_time")
            )
            transmit_frequency_stop[:] = echodata["Sonar/Beam_group1"][
                "transmit_frequency_stop"
            ].values[i]
            transmit_frequency_stop.setncattr(
                "long_name", "Stop frequency in transmitted pulse"
            )
            transmit_frequency_stop.units = "Hz"
            transmit_frequency_stop.valid_min = 0.0

            # Create transmit_power variable
            transmit_power = grp.createVariable(
                "transmit_power", np.float64, ("ping_time", "beam")
            )
            transmit_power[:] = (
                echodata["Sonar/Beam_group1"]["transmit_power"].transpose().values[:, i]
            )
            transmit_power.setncattr("long_name", "Nominal transmit power")
            transmit_power.units = "W"
            transmit_power.valid_min = 0.0

            # Create transmit_type
            transmit_type = grp.createVariable("transmit_type", np.float64, ())
            transmit_type[:] = 0
            transmit_type.setncattr("long_name", "Type of transmitted pulse")

            # Create tx_beam_rotation_phi variable
            tx_beam_roation_phi = grp.createVariable(
                "tx_beam_roation_phi", angle_t, ("ping_time", "beam")
            )
            tx_beam_roation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            tx_beam_roation_phi.units = "arc_degree"
            tx_beam_roation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            tx_beam_roation_psi = grp.createVariable(
                "tx_beam_roation_psi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_psi[:] = rx_beam_rotation_psi_data
            tx_beam_roation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            tx_beam_roation_psi.units = "arc_degree"
            tx_beam_roation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            tx_beam_roation_theta = grp.createVariable(
                "tx_beam_roation_theta", angle_t, ("ping_time", "beam")
            )
            tx_beam_roation_theta[:] = rx_beam_rotation_theta_data[:, i]
            tx_beam_roation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            tx_beam_roation_theta.units = "arc_degree"
            tx_beam_roation_theta.valid_range = [-90.0, 90.0]

`write_ek80_beamgroup_to_netcdf(echodata, export_file)`

Writes echodata Beam_group ds to a Beam_groupX netcdf file.

Args: echodata (echopype.echodata): Echopype echodata object containing beam_group_data. (echopype.DataArray): Echopype DataArray to be written. export_file (str or Path): Path to the NetCDF file.

Source code in src\aalibrary\utils\ices.py

def write_ek80_beamgroup_to_netcdf(echodata, export_file):
    """Writes echodata Beam_group ds to a Beam_groupX netcdf file.

    Args:
    echodata (echopype.echodata): Echopype echodata object containing beam_group_data.
    (echopype.DataArray): Echopype DataArray to be written.
    export_file (str or Path): Path to the NetCDF file.
    """
    ragged_backscatter_r_data = ragged_data_type_ices(echodata, "backscatter_r")
    ragged_backscatter_i_data = ragged_data_type_ices(echodata, "backscatter_i")
    beamwidth_receive_major_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_athwartship"
    )
    beamwidth_receive_minor_data = correct_dimensions_ices(
        echodata, "beamwidth_twoway_alongship"
    )
    echoangle_major_data = correct_dimensions_ices(echodata, "angle_offset_athwartship")
    echoangle_minor_data = correct_dimensions_ices(echodata, "angle_offset_alongship")
    equivalent_beam_angle_data = correct_dimensions_ices(
        echodata, "equivalent_beam_angle"
    )
    rx_beam_rotation_phi_data = (
        correct_dimensions_ices(echodata, "angle_offset_athwartship") * -1
    )
    rx_beam_rotation_psi_data = np.zeros(
        (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
    )
    rx_beam_rotation_theta_data = correct_dimensions_ices(
        echodata, "angle_offset_alongship"
    )

    for i in range(echodata["Sonar/Beam_group1"].sizes["channel"]):

        with netCDF4.Dataset(export_file, "a", format="netcdf4") as ncfile:
            grp = ncfile.createGroup(f"Sonar/Beam_group{i+1}")
            grp.setncattr("beam_mode", echodata["Sonar/Beam_group1"].attrs["beam_mode"])
            grp.setncattr(
                "conversion_equation_type",
                echodata["Sonar/Beam_group1"].attrs["conversion_equation_t"],
            )
            grp.setncattr(
                "long_name", echodata["Sonar/Beam_group1"].coords["channel"].values[i]
            )

            # Create the VLEN type for 32-bit floats
            sample_t = grp.createVLType(np.float32, "sample_t")

            # Create ping_time dimension and ping_time coordinate variable
            grp.createDimension("ping_time", None)

            ping_time_var = grp.createVariable("ping_time", np.int64, ("ping_time",))
            ping_time_var.units = "nanoseconds since 1970-01-01 00:00:00Z"
            ping_time_var.standard_name = "time"
            ping_time_var.long_name = "Time-stamp of each ping"
            ping_time_var.axis = "T"
            ping_time_var.calendar = "gregorian"
            ping_time_var[:] = echodata["Sonar/Beam_group1"].coords[
                "ping_time"
            ].values - np.datetime64("1970-01-01T00:00:00Z")

            # Create beam dimension and coordinate variable
            grp.createDimension("beam", 1)

            beam_var = grp.createVariable("beam", "S1", ("beam",))
            beam_var.long_name = "Beam name"
            beam_var[:] = echodata["Sonar/Beam_group1"].coords["channel"].values[i]

            # Create beam dimension and coordinate variable
            grp.createDimension("sub_beam", 4)

            sub_beam_var = grp.createVariable("sub_beam", np.int64, ("sub_beam",))
            sub_beam_var.long_name = "Beam quadrant number"
            sub_beam_var[:] = echodata["Sonar/Beam_group1"].coords["beam"].values

            # Create backscatter_r variable
            backscatter_r = grp.createVariable(
                "backscatter_r",
                sample_t,
                ("ping_time", "beam", "sub_beam"),
            )
            backscatter_r[:] = ragged_backscatter_r_data[:, i, :]
            backscatter_r.setncattr(
                "long_name", "Raw backscatter measurements (real part)"
            )
            backscatter_r.units = "dB"

            # Create backscatter_i variable
            backscatter_i = grp.createVariable(
                "backscatter_i", sample_t, ("ping_time", "beam", "sub_beam")
            )
            backscatter_i[:] = ragged_backscatter_i_data[:, i, :].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"],
                1,
                echodata["Sonar/Beam_group1"].sizes["beam"],
            )
            backscatter_i.setncattr(
                "long_name", "Raw backscatter measurements (imaginary part)"
            )
            backscatter_i.units = "dB"

            # Create beam_stabilisation variable
            beam_stablisation = grp.createVariable(
                "beam_stablisation", int, ("ping_time", "beam")
            )
            beam_stablisation[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            beam_stablisation.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )

            # Create beam_type variable
            beam_type = grp.createVariable("beam_type", int, ())
            beam_type[:] = echodata["Sonar/Beam_group1"]["beam_type"].values[i]
            beam_type.setncattr("long_name", "type of transducer (0-single, 1-split)")

            # Create beamwidth_receive_major variable
            beamwidth_receive_major = grp.createVariable(
                "beamwidth_receive_major", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_major[:] = beamwidth_receive_major_data[:, i]
            beamwidth_receive_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_receive_major.units = "arc_degree"
            beamwidth_receive_major.valid_range = [0.0, 360.0]

            # stopped here
            # Create beamwidth_receive_minor variable
            beamwidth_receive_minor = grp.createVariable(
                "beamwidth_receive_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_receive_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_receive_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_receive_minor.units = "arc_degree"
            beamwidth_receive_minor.valid_range = [0.0, 360.0]

            beamwidth_transmit_major = grp.createVariable(
                "beamwidth_transmit_major", np.float32, ("ping_time", "beam")
            )
            # Create beamwidth_transmit_major variable
            beamwidth_transmit_major[:] = beamwidth_receive_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_major.setncattr(
                "long_name",
                "Half power one-way receive beam width along major (horizontal) axis of beam",
            )
            beamwidth_transmit_major.units = "arc_degree"
            beamwidth_transmit_major.valid_range = [0.0, 360.0]

            # Create beamwidth_transmit_minor variable
            beamwidth_transmit_minor = grp.createVariable(
                "beamwidth_transmit_minor", np.float32, ("ping_time", "beam")
            )
            beamwidth_transmit_minor[:] = beamwidth_receive_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            beamwidth_transmit_minor.setncattr(
                "long_name",
                "Half power one-way receive beam width along minor (vertical) axis of beam",
            )
            beamwidth_transmit_minor.units = "arc_degree"
            beamwidth_transmit_minor.valid_range = [0.0, 360.0]

            # Create blanking_interval variable
            blanking_interval = grp.createVariable(
                "blanking_interval", np.float32, ("ping_time", "beam")
            )
            blanking_interval[:] = np.zeros(
                (echodata["Sonar/Beam_group1"].sizes["ping_time"], 1)
            )
            blanking_interval.setncattr(
                "long_name", "Beam stabilisation applied(or not)"
            )
            blanking_interval.units = "s"
            blanking_interval.valid_min = 0.0

            # Create calibrated_frequency variable
            calibrated_frequency = grp.createVariable(
                "calibrated_frequency", np.float64, ()
            )
            calibrated_frequency[:] = echodata["Sonar/Beam_group1"][
                "frequency_nominal"
            ].values[i]
            calibrated_frequency.setncattr("long_name", "Calibration gain frequencies")
            calibrated_frequency.units = "Hz"
            calibrated_frequency.valid_min = 0.0

            # Create echoangle_major variable (talk to joe about this)
            echoangle_major = grp.createVariable(
                "echoangle_major", np.float32, ("ping_time", "beam")
            )
            echoangle_major[:] = echoangle_major_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            echoangle_major.setncattr(
                "long_name", "Echo arrival angle in the major beam coordinate"
            )
            echoangle_major.units = "arc_degree"
            echoangle_major.valid_range = [-180.0, 180.0]

            # Create echoangle_minor variable
            echoangle_minor = grp.createVariable(
                "echoangle_minor", np.float32, ("ping_time", "beam")
            )
            echoangle_minor[:] = echoangle_minor_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            echoangle_minor.setncattr(
                "long_name", "Echo arrival angle in the minor beam coordinate"
            )
            echoangle_minor.units = "arc_degree"
            echoangle_minor.valid_range = [-180.0, 180.0]

            # Create echoangle_major sensitivity variable
            echoangle_major_sensitivity = grp.createVariable(
                "echoangle_major_sensitivityr", np.float64, ()
            )
            echoangle_major_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_athwartship"
            ].values[i]
            echoangle_major_sensitivity.setncattr(
                "long_name", "Major angle scaling factor"
            )
            echoangle_major_sensitivity.units = "1"
            echoangle_major_sensitivity.valid_min = 0.0

            # Create echoangle_minor sensitivity variable
            echoangle_minor_sensitivity = grp.createVariable(
                "echoangle_minor_sensitivity", np.float64, ()
            )
            echoangle_minor_sensitivity[:] = echodata["Sonar/Beam_group1"][
                "angle_sensitivity_alongship"
            ].values[i]
            echoangle_minor_sensitivity.setncattr(
                "long_name", "Minor angle scaling factor"
            )
            echoangle_minor_sensitivity.units = "1"
            echoangle_minor_sensitivity.valid_min = 0.0

            # Create equivalent_beam_angle variable (weird angle values)
            equivalent_beam_angle = grp.createVariable(
                "equivalent_beam_angle", np.float32, ("ping_time", "beam")
            )
            equivalent_beam_angle[:] = equivalent_beam_angle_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            equivalent_beam_angle.setncattr("long_name", "Equivalent beam angle")

            # Create frequency variable
            frequency = grp.createVariable("frequency", np.float64, ())
            frequency[:] = echodata["Sonar/Beam_group1"]["frequency_nominal"].values[i]
            frequency.setncattr("long_name", "Calibration gain frequencies")
            frequency.units = "Hz"
            frequency.valid_min = 0.0

            # Create non_quantitative_processing variable
            non_quantitative_processing = grp.createVariable(
                "non_quantitative_processing", int, ("ping_time")
            )
            non_quantitative_processing[:] = np.zeros(
                echodata["Sonar/Beam_group1"].sizes["ping_time"]
            )
            non_quantitative_processing.setncattr(
                "long_name",
                "Presence or not of non-quantitative processing applied to the backscattering data (sonar specific)",
            )

            # Create platform_heading variable
            platform_heading = grp.createVariable(
                "platform_heading", np.float32, ("ping_time")
            )
            platform_heading[:] = echodata["Platform"]["heading"].values
            platform_heading.setncattr("long_name", "Platform heading(true)")
            platform_heading.units = "degrees_north"
            platform_heading.valid_range = [0, 360.0]

            # Create platform_latitude variable
            platform_latitude = grp.createVariable(
                "platform_latitude", np.float32, ("ping_time")
            )
            platform_latitude[:] = echodata["Platform"]["latitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platform_latitude.setncattr(
                "long_name", "Heading of the platform at time of the ping"
            )
            platform_latitude.units = "degrees_north"
            platform_latitude.valid_range = [-180.0, 180.0]

            # Create platform_longitude variable
            platform_longitude = grp.createVariable(
                "platform_longitude", np.float64, ("ping_time")
            )
            platform_longitude[:] = echodata["Platform"]["longitude"].interp(
                time1=echodata["Platform"].coords["time2"].values, method="nearest"
            )
            platform_longitude.setncattr("long_name", "longitude")
            platform_longitude.units = "degrees_east"
            platform_longitude.valid_range = [-180.0, 180.0]

            # Create platform_pitch variable
            platform_pitch = grp.createVariable(
                "platform_pitch", np.float64, ("ping_time")
            )
            platform_pitch[:] = echodata["Platform"]["pitch"].values
            platform_pitch.setncattr("long_name", "pitch_angle")
            platform_pitch.units = "arc_degree"
            platform_pitch.valid_range = [-90.0, 90.0]

            # Create platform_roll variable
            platform_roll = grp.createVariable(
                "platform_roll", np.float64, ("ping_time")
            )
            platform_roll[:] = echodata["Platform"]["roll"].values
            platform_roll.setncattr("long_name", "roll angle")
            platform_roll.units = "arc_degree"

            # Create platform_vertical_offset variable
            platform_vertical_offset = grp.createVariable(
                "platform_vertical_offset", np.float64, ("ping_time")
            )
            platform_vertical_offset[:] = echodata["Platform"]["vertical_offset"].values
            platform_vertical_offset.setncattr(
                "long_name",
                "Platform vertical distance from reference point to the water line",
            )
            platform_vertical_offset.units = "m"

            # Create rx_beam_rotation_phi variable
            rx_beam_rotation_phi = grp.createVariable(
                "rx_beam_rotation_phi", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            rx_beam_rotation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            rx_beam_rotation_phi.units = "arc_degree"
            rx_beam_rotation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            rx_beam_rotation_psi = grp.createVariable(
                "rx_beam_rotation_psi", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_psi[:] = rx_beam_rotation_psi_data
            rx_beam_rotation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            rx_beam_rotation_psi.units = "arc_degree"
            rx_beam_rotation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            rx_beam_rotation_theta = grp.createVariable(
                "rx_beam_roation_theta", np.float32, ("ping_time", "beam")
            )
            rx_beam_rotation_theta[:] = rx_beam_rotation_theta_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            rx_beam_rotation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            rx_beam_rotation_theta.units = "arc_degree"
            rx_beam_rotation_theta.valid_range = [-90.0, 90.0]

            # Create sample_interval variable
            sample_interval = grp.createVariable(
                "sample_interval", np.float64, ("ping_time", "beam")
            )
            sample_interval[:] = (
                echodata["Sonar/Beam_group1"]["sample_interval"]
                .transpose()
                .values[:, i]
            )
            sample_interval.setncattr("long_name", "Equivalent beam angle")
            sample_interval.units = "s"
            sample_interval.valid_min = 0.0
            sample_interval.coordinates = (
                "ping_time platform_latitude platform_longitude"
            )

            # Create sample_time_offset variable
            sample_time_offset = grp.createVariable(
                "sample_time_offset", np.float32, ("ping_time", "beam")
            )
            sample_time_offset[:] = (
                echodata["Sonar/Beam_group1"]["sample_time_offset"]
                .transpose()
                .values[:, i]
            )
            sample_time_offset.setncattr(
                "long_name",
                "Time offset that is subtracted from the timestamp of each sample",
            )
            sample_time_offset.units = "s"

            # Create transmit_duration_nominal variable
            transmit_duration_nominal = grp.createVariable(
                "transmit_duration_nominal", np.float32, ("ping_time", "beam")
            )
            transmit_duration_nominal[:] = (
                echodata["Sonar/Beam_group1"]["transmit_duration_nominal"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_duration_nominal.setncattr(
                "long_name", "Nominal duration of transmitted pulse"
            )
            transmit_duration_nominal.units = "Hz"
            transmit_duration_nominal.valid_min = 0.0

            # Create transmit_frequency_start variable
            transmit_frequency_start = grp.createVariable(
                "transmit_frequency_start", np.float32, ("ping_time", "beam")
            )
            transmit_frequency_start[:] = (
                echodata["Sonar/Beam_group1"]["transmit_frequency_start"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_frequency_start.setncattr(
                "long_name", "Start frequency in transmitted pulse"
            )
            transmit_frequency_start.units = "Hz"
            transmit_frequency_start.valid_min = 0.0

            # Create transmit_frequency_stop variable
            transmit_frequency_stop = grp.createVariable(
                "transmit_frequency_stop", np.float32, ("ping_time", "beam")
            )
            transmit_frequency_stop[:] = (
                echodata["Sonar/Beam_group1"]["transmit_frequency_stop"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_frequency_stop.setncattr(
                "long_name", "Stop frequency in transmitted pulse"
            )
            transmit_frequency_stop.units = "Hz"
            transmit_frequency_stop.valid_min = 0.0

            # Create transmit_power variable
            transmit_power = grp.createVariable(
                "transmit_power", np.float32, ("ping_time", "beam")
            )
            transmit_power[:] = (
                echodata["Sonar/Beam_group1"]["transmit_power"]
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_power.setncattr("long_name", "Nominal transmit power")
            transmit_power.units = "W"
            transmit_power.valid_min = 0.0

            # Create transmit_type
            transmit_type = grp.createVariable(
                "transmit_type", np.float32, ("ping_time", "beam")
            )
            transmit_type[:] = (
                echodata["Sonar/Beam_group1"]["transmit_type"]
                .where(echodata["Sonar/Beam_group1"]["transmit_type"] != "CW", 0)
                .where(echodata["Sonar/Beam_group1"]["transmit_type"] != "LFM", 1)
                .transpose()
                .values[:, i]
                .astype(np.float32)
            )
            transmit_type.setncattr("long_name", "Type of transmitted pulse")

            # Create tx_beam_rotation_phi variable
            tx_beam_roation_phi = grp.createVariable(
                "tx_beam_roation_phi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_phi[:] = rx_beam_rotation_phi_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_phi.setncattr(
                "long_name", "receive beam angular rotation about the x axis"
            )
            tx_beam_roation_phi.units = "arc_degree"
            tx_beam_roation_phi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_psi variable
            tx_beam_roation_psi = grp.createVariable(
                "tx_beam_roation_psi", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_psi[:] = rx_beam_rotation_psi_data
            tx_beam_roation_psi.setncattr(
                "long_name", "receive beam angular rotation about the z axis"
            )
            tx_beam_roation_psi.units = "arc_degree"
            tx_beam_roation_psi.valid_range = [-180.0, 180.0]

            # Create rx_beam_rotation_theta variable
            tx_beam_roation_theta = grp.createVariable(
                "tx_beam_roation_theta", np.float32, ("ping_time", "beam")
            )
            tx_beam_roation_theta[:] = rx_beam_rotation_theta_data[:, i].reshape(
                echodata["Sonar/Beam_group1"].sizes["ping_time"], 1
            )
            tx_beam_roation_theta.setncattr(
                "long_name", "receive beam angular rotation about the y axis"
            )
            tx_beam_roation_theta.units = "arc_degree"
            tx_beam_roation_theta.valid_range = [-90.0, 90.0]

`nc_reader`

This file is used to get header information out of a NetCDF file. The code reads a .nc file and returns a dict with all of the attributes gathered.

Functions:

Name	Description
`get_netcdf_header`	Reads a NetCDF file and returns its header as a dictionary.

`get_netcdf_header(file_path)`

Reads a NetCDF file and returns its header as a dictionary.

Parameters:

Name	Type	Description	Default
`file_path`	`str`	Path to the NetCDF file.	required

Returns:

Name	Type	Description
`dict`	`dict`	Dictionary containing global attributes, dimensions, and
	`dict`	variables.

Source code in src\aalibrary\utils\nc_reader.py

def get_netcdf_header(file_path: str) -> dict:
    """Reads a NetCDF file and returns its header as a dictionary.

    Args:
        file_path (str): Path to the NetCDF file.

    Returns:
        dict: Dictionary containing global attributes, dimensions, and
        variables.
    """
    header_info = {}

    with Dataset(file_path, "r") as nc_file:
        # Extract global attributes
        header_info["global_attributes"] = {
            attr: getattr(nc_file, attr) for attr in nc_file.ncattrs()
        }

        # Extract dimensions
        header_info["dimensions"] = {
            dim: len(nc_file.dimensions[dim]) for dim in nc_file.dimensions
        }

        # Extract variable metadata
        header_info["variables"] = {
            var: {
                "dimensions": nc_file.variables[var].dimensions,
                "shape": nc_file.variables[var].shape,
                "dtype": str(nc_file.variables[var].dtype),
                "attributes": {
                    attr: getattr(nc_file.variables[var], attr)
                    for attr in nc_file.variables[var].ncattrs()
                },
            }
            for var in nc_file.variables
        }

    return header_info

`ncei_cache_daily_script`

Script to get all objects in the NCEI S3 bucket and cache it to BigQuery. Ideally, should run every time a file is updated, however, it is set to run daily via a cronjob.

Cron job command: 0 1 * * * /usr/bin/python3 /path/to/aalibrary/src/aalibrary/utils/test.py

`ncei_utils`

This file contains code pertaining to auxiliary functions related to parsing through NCEI's s3 bucket.

Functions:

Name	Description
`check_if_tugboat_metadata_json_exists_in_survey`	Checks whether a Tugboat metadata JSON file exists within a survey.
`download_single_file_from_aws`	Safely downloads a file from AWS storage bucket, aka the NCEI
`download_specific_folder_from_ncei`	Downloads a specific folder and all of its contents from NCEI to a local
`get_all_echosounders_in_a_survey`	Gets all of the echosounders in a particular survey from NCEI.
`get_all_echosounders_that_exist_in_ncei`	Gets a list of all possible echosounders from NCEI.
`get_all_file_names_from_survey`	Gets all of the file names from a particular NCEI survey.
`get_all_file_names_in_a_surveys_echosounder_folder`	Gets all of the file names from a particular NCEI survey's echosounder
`get_all_metadata_files_in_survey`	Gets all of the metadata file names from a particular NCEI survey.
`get_all_raw_file_names_from_survey`	Gets all of the file names from a particular NCEI survey.
`get_all_ship_names_in_ncei`	Gets all of the ship names from NCEI. This is based on all of the
`get_all_survey_names_from_a_ship`	Gets a list of all of the survey names that exist under a ship name.
`get_all_surveys_in_ncei`	Gets a list of all of the possible survey names from NCEI.
`get_checksum_sha256_from_s3`	Gets the SHA-256 checksum of the s3 object.
`get_closest_ncei_formatted_ship_name`	Gets the closest NCEI formatted ship name to the given ship name.
`get_echosounder_from_raw_file`	Gets the echosounder used for a particular raw file.
`get_file_size_from_s3`	Gets the file size of an object in s3.
`get_folder_size_from_s3`	Gets the folder size in bytes from S3.
`get_random_raw_file_from_ncei`	Creates a test raw file for NCEI. This is used for testing purposes
`search_ncei_file_objects_for_string`	Searches NCEI for a file type's object keys that contain a particular
`search_ncei_objects_for_string`	Searches NCEI for object keys that contain a particular string. This

`check_if_tugboat_metadata_json_exists_in_survey(ship_name='', survey_name='', s3_bucket=None)`

Checks whether a Tugboat metadata JSON file exists within a survey. Returns the file's object key or None if it does not exist.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`s3_bucket`	`resource`	The bucket resource object. Defaults to None.	`None`

Returns: Union[str, None]: Returns the file's object key string or None if it does not exist.

Source code in src\aalibrary\utils\ncei_utils.py

def check_if_tugboat_metadata_json_exists_in_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_bucket: boto3.resource = None,
) -> Union[str, None]:
    """Checks whether a Tugboat metadata JSON file exists within a survey.
    Returns the file's object key or None if it does not exist.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_bucket (boto3.resource, optional): The bucket resource object.
            Defaults to None.
    Returns:
        Union[str, None]: Returns the file's object key string or None if it
            does not exist.
    """

    # Find all metadata files within the metadata/ folder in NCEI
    all_metadata_obj_keys = list_all_objects_in_s3_bucket_location(
        prefix=f"data/raw/{ship_name}/{survey_name}/metadata",
        s3_resource=s3_bucket,
    )

    for obj_key, file_name in all_metadata_obj_keys:
        # Handle for main metadata file for upload to BigQuery.
        if file_name.endswith("metadata.json"):
            return obj_key

    return None

`download_single_file_from_aws(file_url='', download_location='')`

Safely downloads a file from AWS storage bucket, aka the NCEI repository.

Parameters:

Name	Type	Description	Default
`file_url`	`str`	The file url. Defaults to "".	`''`
`download_location`	`str`	The local download location for the file. Defaults to "".	`''`

Source code in src\aalibrary\utils\ncei_utils.py

def download_single_file_from_aws(
    file_url: str = "",
    download_location: str = "",
):
    """Safely downloads a file from AWS storage bucket, aka the NCEI
    repository.

    Args:
        file_url (str, optional): The file url. Defaults to "".
        download_location (str, optional): The local download location for the
            file. Defaults to "".
    """

    try:
        _, s3_resource, s3_bucket = create_s3_objs()
    except Exception as e:
        logging.error("CANNOT ESTABLISH CONNECTION TO S3 BUCKET..\n{%s}", e)
        raise

    # We replace the beginning of common file paths
    file_url = get_object_key_for_s3(file_url=file_url)
    file_name = get_file_name_from_url(file_url)

    # Check if the file exists in s3
    file_exists = check_if_file_exists_in_s3(
        object_key=file_url,
        s3_resource=s3_resource,
        s3_bucket_name=s3_bucket.name,
    )

    if file_exists:
        # Finally download the file.
        try:
            logging.info("DOWNLOADING `%s`...", file_name)
            s3_bucket.download_file(file_url, download_location)
            logging.info(
                "DOWNLOADED `%s` TO `%s`", file_name, download_location
            )
        except Exception as e:
            logging.error(
                "ERROR DOWNLOADING FILE `%s` DUE TO\n%s", file_name, e
            )
            raise
    else:
        logging.error(
            "FILE %s DOES NOT EXIST IN NCEI S3 BUCKET. SKIPPING...", file_name
        )

`download_specific_folder_from_ncei(folder_prefix='', download_directory='', debug=False)`

Downloads a specific folder and all of its contents from NCEI to a local directory.

Parameters:

Name	Type	Description	Default
`folder_prefix`	`str`	The folder's path in the s3 bucket. Ex. 'data/raw/Reuben_Lasker/' Defaults to "".	`''`
`download_directory`	`str`	The directory you want to download the folder and all of its contents to. Defaults to "".	`''`
`debug`	`bool`	Whether or not to print debug information. Defaults to False.	`False`

Source code in src\aalibrary\utils\ncei_utils.py

def download_specific_folder_from_ncei(
    folder_prefix: str = "", download_directory: str = "", debug: bool = False
):
    """Downloads a specific folder and all of its contents from NCEI to a local
    directory.

    Args:
        folder_prefix (str, optional): The folder's path in the s3 bucket.
            Ex. 'data/raw/Reuben_Lasker/'
            Defaults to "".
        download_directory (str, optional): The directory you want to download
            the folder and all of its contents to. Defaults to "".
        debug (bool, optional): Whether or not to print debug information.
            Defaults to False.
    """

    if not folder_prefix.endswith("/"):
        folder_prefix += "/"

    assert (download_directory is not None) and (
        download_directory != ""
    ), "You must provide a download_directory to download the folder to."

    if debug:
        logging.debug("FORMATTED DOWNLOAD DIRECTORY: %s", download_directory)

    # Get all s3 objects for the survey
    print(f"GETTING ALL S3 OBJECTS FOR FOLDER `{folder_prefix}`...")
    _, s3_resource, _ = create_s3_objs()
    s3_objects = list_all_objects_in_s3_bucket_location(
        prefix=folder_prefix,
        s3_resource=s3_resource,
        return_full_paths=True,
    )
    print(f"FOUND {len(s3_objects)} FILES.")

    subdirs = set()
    # Get the subfolders from object keys
    for s3_object in s3_objects:
        # Skip folders
        if s3_object.endswith("/"):
            continue
        # Get the subfolder structure from the object key
        subfolder_key = os.sep.join(
            s3_object.replace("data/raw/", "").split("/")[:-1]
        )
        subdirs.add(subfolder_key)
    for subdir in subdirs:
        os.makedirs(os.sep.join([download_directory, subdir]), exist_ok=True)

    # Create the directory if it doesn't exist.
    if not os.path.isdir(download_directory):
        print(f"CREATING download_directory `{download_directory}`")
        os.makedirs(download_directory, exist_ok=True)
    # normalize the path
    download_directory = os.path.normpath(download_directory)
    print("CREATED DOWNLOAD SUBDIRECTORIES.")

    for idx, object_key in enumerate(tqdm(s3_objects, desc="Downloading")):
        file_name = object_key.split("/")[-1]
        local_object_path = object_key.replace("data/raw/", "")
        download_location = os.path.normpath(
            os.sep.join([download_directory, local_object_path])
        )
        download_single_file_from_aws(
            file_url=object_key, download_location=download_location
        )
    print(f"DOWNLOAD COMPLETE {os.path.abspath(download_directory)}.")

`get_all_echosounders_in_a_survey(ship_name='', survey_name='', s3_client=None, return_full_paths=False)`

Gets all of the echosounders in a particular survey from NCEI.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_echosounders_in_a_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the echosounders in a particular survey from NCEI.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/"
    all_survey_folder_names = get_subdirectories_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_client=s3_client,
        return_full_paths=return_full_paths,
        bucket_name="noaa-wcsd-pds",
    )
    # Get echosounder folders by ignoring the other metadata folders
    all_echosounders = []
    for folder_name in all_survey_folder_names:
        if (
            ("calibration" not in folder_name.lower())
            and ("metadata" not in folder_name.lower())
            and ("json" not in folder_name.lower())
            and ("doc" not in folder_name.lower())
        ):
            all_echosounders.append(folder_name)

    return all_echosounders

`get_all_echosounders_that_exist_in_ncei(s3_client=None)`

Gets a list of all possible echosounders from NCEI.

Parameters:

Name	Type	Description	Default
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_echosounders_that_exist_in_ncei(
    s3_client: boto3.client = None,
) -> List[str]:
    """Gets a list of all possible echosounders from NCEI.

    Args:
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # First we get all of the prefixes for each survey to exist in NCEI.
    all_survey_prefixes = get_all_surveys_in_ncei(
        s3_client=s3_client, return_full_paths=True
    )
    all_echosounders = set()
    for survey_prefix in tqdm(
        all_survey_prefixes, desc="Getting Echosounders"
    ):
        # Remove trailing `/`
        survey_prefix = survey_prefix.strip("/")
        survey_name = survey_prefix.split("/")[-1]
        ship_name = survey_prefix.split("/")[-2]
        survey_echosounders = get_all_echosounders_in_a_survey(
            ship_name=ship_name,
            survey_name=survey_name,
            s3_client=s3_client,
            return_full_paths=False,
        )
        all_echosounders.update(survey_echosounders)

    return list(all_echosounders)

`get_all_file_names_from_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)`

Gets all of the file names from a particular NCEI survey.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`s3_resource`	`resource`	The resource used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the echosounder name. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_file_names_from_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the echosounder name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_files

`get_all_file_names_in_a_surveys_echosounder_folder(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)`

Gets all of the file names from a particular NCEI survey's echosounder folder.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used. Defaults to "".	`''`
`s3_resource`	`resource`	The resource used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the file name. Whether these are full paths or just file names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_file_names_in_a_surveys_echosounder_folder(
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey's echosounder
    folder.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        echosounder (str, optional): The echosounder used. Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the file name. Whether
            these are full paths or just file names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/{echosounder}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_files

`get_all_metadata_files_in_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)`

Gets all of the metadata file names from a particular NCEI survey.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`s3_resource`	`resource`	The resource used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the metadata file name. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter. Returns empty list '[]' if no metadata files are present.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_metadata_files_in_survey(
    ship_name: str = "",
    survey_name: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the metadata file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the metadata file name.
            Whether these are full paths or just folder names are specified by
            the `return_full_paths` parameter. Returns empty list '[]' if no
            metadata files are present.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/metadata/"
    all_metadata_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    return all_metadata_files

`get_all_raw_file_names_from_survey(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)`

Gets all of the file names from a particular NCEI survey.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`survey_name`	`str`	The survey name exactly as it is in NCEI. Defaults to "".	`''`
`echosounder`	`str`	The echosounder used. Defaults to "".	`''`
`s3_resource`	`resource`	The resource used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being the raw file name. Whether these are full paths or just folder names are specified by the `return_full_paths` parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_raw_file_names_from_survey(
    ship_name: str = "",
    survey_name: str = "",
    echosounder: str = "",
    s3_resource: boto3.resource = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets all of the file names from a particular NCEI survey.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        survey_name (str, optional): The survey name exactly as it is in NCEI.
            Defaults to "".
        echosounder (str, optional): The echosounder used. Defaults to "".
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.

    Returns:
        List[str]: A list of strings, each being the raw file name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    survey_prefix = f"data/raw/{ship_name}/{survey_name}/{echosounder}/"
    all_files = list_all_objects_in_s3_bucket_location(
        prefix=survey_prefix,
        s3_resource=s3_resource,
        return_full_paths=return_full_paths,
    )
    all_files = [file for file in all_files if file.endswith(".raw")]
    return all_files

`get_all_ship_names_in_ncei(normalize=False, s3_client=None, return_full_paths=False)`

Gets all of the ship names from NCEI. This is based on all of the folders listed under the data/raw/ prefix.

Parameters:

Name	Type	Description	Default
`normalize`	`bool`	Whether or not to normalize the ship_name attribute to how GCP stores it. Defaults to False.	`False`
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_ship_names_in_ncei(
    normalize: bool = False,
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
):
    """Gets all of the ship names from NCEI. This is based on all of the
    folders listed under the `data/raw/` prefix.

    Args:
        normalize (bool, optional): Whether or not to normalize the ship_name
            attribute to how GCP stores it. Defaults to False.
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # Get the initial subdirs
    prefix = "data/raw/"
    subdirs = get_subdirectories_in_s3_bucket_location(
        prefix=prefix, s3_client=s3_client, return_full_paths=return_full_paths
    )
    if normalize:
        subdirs = [normalize_ship_name(ship_name=subdir) for subdir in subdirs]
    return subdirs

`get_all_survey_names_from_a_ship(ship_name='', s3_client=None, return_full_paths=False)`

Gets a list of all of the survey names that exist under a ship name.

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship's name you want to get all surveys from. Defaults to None. NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use the `get_all_ship_names_in_ncei` function to see all possible NCEI ship names.	`''`
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns: List[str]: A list of strings, each being the survey name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_survey_names_from_a_ship(
    ship_name: str = "",
    s3_client: boto3.client = None,
    return_full_paths: bool = False,
) -> List[str]:
    """Gets a list of all of the survey names that exist under a ship name.

    Args:
        ship_name (str, optional): The ship's name you want to get all surveys
            from. Defaults to None.
            NOTE: The ship's name MUST be spelled exactly as it is in NCEI. Use
            the `get_all_ship_names_in_ncei` function to see all possible NCEI
            ship names.
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    Returns:
        List[str]: A list of strings, each being the survey name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """
    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # Make sure the ship name is valid
    all_ship_names = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=False
    )
    if ship_name not in all_ship_names:
        close_matches = get_close_matches(
            ship_name, all_ship_names, n=3, cutoff=0.6
        )
    assert ship_name in all_ship_names, (
        f"The ship name provided `{ship_name}` "
        "needs to be spelled exactly like in NCEI.\n"
        "Use the `get_all_ship_names_in_ncei` function to see all possible "
        "NCEI ship names.\n"
        f"Did you mean one of these possible ship names?\n{close_matches}"
    )

    ship_prefix = f"data/raw/{ship_name}/"
    all_surveys = set()
    # Get a list of all of this ship's survey names
    all_ship_survey_names = get_subdirectories_in_s3_bucket_location(
        prefix=ship_prefix,
        s3_client=s3_client,
        return_full_paths=return_full_paths,
        bucket_name="noaa-wcsd-pds",
    )
    all_surveys.update(all_ship_survey_names)
    return list(all_surveys)

`get_all_surveys_in_ncei(s3_client=None, return_full_paths=False)`

Gets a list of all of the possible survey names from NCEI.

Parameters:

Name	Type	Description	Default
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`
`return_full_paths`	`bool`	Whether or not you want a full path from bucket root to the subdirectory returned. Set to false if you only want the subdirectory names listed. Defaults to False.	`False`

Returns: List[str]: A list of strings, each being the survey name. Whether these are full paths or just folder names are specified by the return_full_paths parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def get_all_surveys_in_ncei(
    s3_client: boto3.client = None, return_full_paths: bool = False
) -> List[str]:
    """Gets a list of all of the possible survey names from NCEI.

    Args:
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.
        return_full_paths (bool, optional): Whether or not you want a full
            path from bucket root to the subdirectory returned. Set to false
            if you only want the subdirectory names listed. Defaults to False.
    Returns:
        List[str]: A list of strings, each being the survey name. Whether
            these are full paths or just folder names are specified by the
            `return_full_paths` parameter.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    # First we get all of the prefixes for each ship.
    all_ship_prefixes = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=True
    )
    all_surveys = set()
    for ship_prefix in tqdm(all_ship_prefixes, desc="Getting Surveys"):
        # Get a list of all of this ship's survey names
        all_ship_survey_names = get_subdirectories_in_s3_bucket_location(
            prefix=ship_prefix,
            s3_client=s3_client,
            return_full_paths=return_full_paths,
            bucket_name="noaa-wcsd-pds",
        )
        all_surveys.update(all_ship_survey_names)
    return list(all_surveys)

`get_checksum_sha256_from_s3(object_key, s3_resource)`

Gets the SHA-256 checksum of the s3 object.

Source code in src\aalibrary\utils\ncei_utils.py

def get_checksum_sha256_from_s3(object_key, s3_resource):
    """Gets the SHA-256 checksum of the s3 object."""
    obj = s3_resource.Object("noaa-wcsd-pds", object_key)
    checksum = obj.checksum_sha256
    return checksum

`get_closest_ncei_formatted_ship_name(ship_name='', s3_client=None)`

Gets the closest NCEI formatted ship name to the given ship name. NOTE: Only use if the data_source=="NCEI".

Parameters:

Name	Type	Description	Default
`ship_name`	`str`	The ship name to search the closest match for. Defaults to "".	`''`
`s3_client`	`client`	The client used to perform this operation. Defaults to None, but creates a client for you instead.	`None`

Returns:

Type	Description
`Union[str, None]`	Union[str, None]: The NCEI formatted ship name or None, if none matched.

Source code in src\aalibrary\utils\ncei_utils.py

def get_closest_ncei_formatted_ship_name(
    ship_name: str = "",
    s3_client: boto3.client = None,
) -> Union[str, None]:
    """Gets the closest NCEI formatted ship name to the given ship name.
    NOTE: Only use if the `data_source`=="NCEI".

    Args:
        ship_name (str, optional): The ship name to search the closest match
            for.
            Defaults to "".
        s3_client (boto3.client, optional): The client used to perform this
            operation. Defaults to None, but creates a client for you instead.

    Returns:
        Union[str, None]: The NCEI formatted ship name or None, if none
            matched.
    """

    # Create client objects if they dont exist.
    if s3_client is None:
        s3_client, _, _ = create_s3_objs()

    all_ship_names = get_all_ship_names_in_ncei(
        normalize=False, s3_client=s3_client, return_full_paths=False
    )
    close_matches = get_close_matches(
        ship_name, all_ship_names, n=3, cutoff=0.85
    )
    if len(close_matches) >= 1:
        return close_matches[0]
    else:
        return None

`get_echosounder_from_raw_file(file_name='', ship_name='', survey_name='', echosounders=None, s3_client=None, s3_resource=None, s3_bucket=None)`

Gets the echosounder used for a particular raw file.

Source code in src\aalibrary\utils\ncei_utils.py

def get_echosounder_from_raw_file(
    file_name: str = "",
    ship_name: str = "",
    survey_name: str = "",
    echosounders: List[str] = None,
    s3_client: boto3.client = None,
    s3_resource: boto3.resource = None,
    s3_bucket: boto3.resource = None,
):
    """Gets the echosounder used for a particular raw file."""

    if (s3_client is None) or (s3_resource is None) or (s3_bucket is None):
        s3_client, s3_resource, s3_bucket = create_s3_objs()

    if echosounders is None:
        echosounders = get_all_echosounders_in_a_survey(
            ship_name=ship_name,
            survey_name=survey_name,
            s3_client=s3_client,
            return_full_paths=False,
        )

    for echosounder in echosounders:
        raw_file_location = (
            f"data/raw/{ship_name}/{survey_name}/{echosounder}/{file_name}"
        )
        raw_file_exists = check_if_file_exists_in_s3(
            object_key=raw_file_location,
            s3_resource=s3_resource,
            s3_bucket_name=s3_bucket.name,
        )
        if raw_file_exists:
            return echosounder

    return ValueError("An echosounder could not be found for this raw file.")

`get_file_size_from_s3(object_key, s3_resource)`

Gets the file size of an object in s3.

Source code in src\aalibrary\utils\ncei_utils.py

def get_file_size_from_s3(object_key, s3_resource):
    """Gets the file size of an object in s3."""
    obj = s3_resource.Object("noaa-wcsd-pds", object_key)
    file_size = obj.content_length
    return file_size

`get_folder_size_from_s3(folder_prefix, s3_resource)`

Gets the folder size in bytes from S3.

Parameters:

Name	Type	Description	Default
`folder_prefix`	`str`	The object key prefix of the folder in S3.	required
`s3_resource`	`resource`	The resource used to perform this operation. Defaults to None, but creates a client for you instead.	required

Returns:

Name	Type	Description
`int`	`int`	The total size of the folder in bytes.

Source code in src\aalibrary\utils\ncei_utils.py

def get_folder_size_from_s3(
    folder_prefix: str, s3_resource: boto3.resource
) -> int:
    """Gets the folder size in bytes from S3.

    Args:
        folder_prefix (str): The object key prefix of the folder in S3.
        s3_resource (boto3.resource, optional): The resource used to perform
            this operation. Defaults to None, but creates a client for you
            instead.

    Returns:
        int: The total size of the folder in bytes.
    """
    if s3_resource is None:
        _, s3_resource, _ = create_s3_objs()

    # Initialize total size
    total_size = 0

    # Get all objects' keys in the folder
    all_files_object_keys = list_all_objects_in_s3_bucket_location(
        prefix=folder_prefix,
        s3_resource=s3_resource,
        return_full_paths=True,
    )

    for file_object_key in tqdm(
        all_files_object_keys, desc="Calculating Folder Size"
    ):
        total_size += get_file_size_from_s3(
            object_key=file_object_key, s3_resource=s3_resource
        )

    return total_size

`get_random_raw_file_from_ncei()`

Creates a test raw file for NCEI. This is used for testing purposes only. Retries automatically if an error occurs.

Returns:

Type	Description
`List[str]`	List[str]: A list object with strings denoting each parameter required for creating a raw file object. Ex. [ random_ship_name, random_survey_name, random_echosounder, random_raw_file, ]

Source code in src\aalibrary\utils\ncei_utils.py

def get_random_raw_file_from_ncei() -> List[str]:
    """Creates a test raw file for NCEI. This is used for testing purposes
    only. Retries automatically if an error occurs.

    Returns:
        List[str]: A list object with strings denoting each parameter required
            for creating a raw file object.
            Ex. [
                random_ship_name,
                random_survey_name,
                random_echosounder,
                random_raw_file,
            ]
    """

    try:
        # Get all of the ship names
        all_ship_names = get_all_ship_names_in_ncei(
            normalize=False, return_full_paths=False
        )
        random_ship_name = all_ship_names[randint(0, len(all_ship_names) - 1)]
        # Get all of the surveys for this ship
        all_surveys_for_this_ship = get_all_survey_names_from_a_ship(
            ship_name=random_ship_name, return_full_paths=False
        )
        random_survey_name = all_surveys_for_this_ship[
            randint(0, len(all_surveys_for_this_ship) - 1)
        ]
        # Get all of the echosounders in this survey
        all_echosounders_for_this_survey = get_all_echosounders_in_a_survey(
            ship_name=random_ship_name,
            survey_name=random_survey_name,
            return_full_paths=False,
        )
        random_echosounder = all_echosounders_for_this_survey[
            randint(0, len(all_echosounders_for_this_survey) - 1)
        ]
        # Get all of the raw files in this echosounder
        all_raw_files_in_echosounder = get_all_raw_file_names_from_survey(
            ship_name=random_ship_name,
            survey_name=random_survey_name,
            echosounder=random_echosounder,
            return_full_paths=False,
        )
        random_raw_file = all_raw_files_in_echosounder[
            randint(0, len(all_raw_files_in_echosounder) - 1)
        ]

        return [
            random_ship_name,
            random_survey_name,
            random_echosounder,
            random_raw_file,
        ]
    except Exception:
        return get_random_raw_file_from_ncei()

`search_ncei_file_objects_for_string(search_param='', file_extension='.raw')`

Searches NCEI for a file type's object keys that contain a particular string. This string can be anything, such as an echosounder name, ship name, survey name, or even a partial file name. The file type can be specified by the file_extension parameter. NOTE: This function takes a long time to run, as it has to search through ALL of NCEI's objects.

Parameters:

Name	Type	Description	Default
`search_param`	`str`	The string to search for. Defaults to "".	`''`
`file_extension`	`str`	The file extension to filter results by. Defaults to ".raw".	`'.raw'`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being an object key that contains the search parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def search_ncei_file_objects_for_string(
    search_param: str = "", file_extension: str = ".raw"
) -> List[str]:
    """Searches NCEI for a file type's object keys that contain a particular
    string. This string can be anything, such as an echosounder name,
    ship name, survey name, or even a partial file name. The file type can be
    specified by the file_extension parameter.
    NOTE: This function takes a long time to run, as it has to search through
    ALL of NCEI's objects.

    Args:
        search_param (str, optional): The string to search for. Defaults to "".
        file_extension (str, optional): The file extension to filter results
            by. Defaults to ".raw".

    Returns:
        List[str]: A list of strings, each being an object key that contains
            the search parameter.
    """

    s3_client, _, _ = create_s3_objs()
    paginator = s3_client.get_paginator("list_objects_v2")
    page_iterator = paginator.paginate(Bucket="noaa-wcsd-pds")
    matching_object_keys = []
    objects = page_iterator.search(
        f"Contents[?contains(Key, `{search_param}`)"
        f" && ends_with(Key, `{file_extension}`)][]"
    )
    for item in objects:
        print(item["Key"])
        matching_object_keys.append(item["Key"])
    return matching_object_keys

`search_ncei_objects_for_string(search_param='')`

Searches NCEI for object keys that contain a particular string. This string can be anything, such as an echosounder name, ship name, survey name, or even a partial file name. NOTE: This function takes a long time to run, as it has to search through ALL of NCEI's objects. NOTE: Use a folder name as the search_param to get all object keys that contain that folder name. (e.g. '/EK80/')

Parameters:

Name	Type	Description	Default
`search_param`	`str`	The string to search for. Defaults to "".	`''`

Returns:

Type	Description
`List[str]`	List[str]: A list of strings, each being an object key that contains the search parameter.

Source code in src\aalibrary\utils\ncei_utils.py

def search_ncei_objects_for_string(search_param: str = "") -> List[str]:
    """Searches NCEI for object keys that contain a particular string. This
    string can be anything, such as an echosounder name, ship name,
    survey name, or even a partial file name.
    NOTE: This function takes a long time to run, as it has to search through
    ALL of NCEI's objects.
    NOTE: Use a folder name as the search_param to get all object keys that
    contain that folder name. (e.g. '/EK80/')

    Args:
        search_param (str, optional): The string to search for. Defaults to "".

    Returns:
        List[str]: A list of strings, each being an object key that contains
            the search parameter.
    """

    s3_client, _, _ = create_s3_objs()
    paginator = s3_client.get_paginator("list_objects_v2")
    page_iterator = paginator.paginate(Bucket="noaa-wcsd-pds")
    matching_object_keys = []
    # Vpcs[?contains(`["vpc-blabla1", "vpc-blabla2"]`, VpcId)].OtherKey
    # objects = page_iterator.search(f"
    # Contents[?contains(Key, `{search_param}`) && ends_with(Key, `.raw`)][]")
    objects = page_iterator.search(
        f"Contents[?contains(Key, `{search_param}`)][]"
    )
    # objects = page_iterator.search("Contents[?ends_with(Key, `.csv`)][]")
    for item in objects:
        matching_object_keys.append(item["Key"])
    return matching_object_keys

`sonar_checker`

Modules:

Name	Description
`ek_date_conversion`	Code originally developed for pyEcholab
`ek_raw_io`	Code originally developed for pyEcholab
`ek_raw_parsers`	Code originally developed for pyEcholab
`log`
`misc`
`sonar_checker`

`ek_date_conversion`

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

Contains functions to convert date information.

TODO: merge necessary function into ek60.py or group everything into a class TODO: fix docstring

Functions:

Name	Description
`nt_to_unix`	:param nt_timestamp_tuple: Tuple of two longs representing the NT date
`unix_to_nt`	Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders

`datetime_to_unix(datetime_obj)`

:param datetime_obj: datetime object to convert :type datetime_obj: :class:datetime.datetime

:param tz: Timezone to use for converted time -- if None, uses timezone information contained within datetime_obj :type tz: :class:datetime.tzinfo

from pytz import utc from datetime import datetime epoch = datetime(1970, 1, 1, tzinfo=utc) assert datetime_to_unix(epoch) == 0

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py

def datetime_to_unix(datetime_obj):
    """
    :param datetime_obj: datetime object to convert
    :type datetime_obj: :class:`datetime.datetime`

    :param tz: Timezone to use for converted time -- if None, uses timezone
                information contained within datetime_obj
    :type tz: :class:datetime.tzinfo

    >>> from pytz import utc
    >>> from datetime import datetime
    >>> epoch = datetime(1970, 1, 1, tzinfo=utc)
    >>> assert datetime_to_unix(epoch) == 0
    """

    timestamp = (datetime_obj - UTC_UNIX_EPOCH).total_seconds()

    return timestamp

`nt_to_unix(nt_timestamp_tuple, return_datetime=True)`

:param nt_timestamp_tuple: Tuple of two longs representing the NT date :type nt_timestamp_tuple: (long, long)

:param return_datetime: Return a datetime object instead of float :type return_datetime: bool

Returns a datetime.datetime object w/ UTC timezone calculated from the nt time tuple

lowDateTime, highDateTime = nt_timestamp_tuple

The timestamp is a 64bit count of 100ns intervals since the NT epoch broken into two 32bit longs, least significant first:

dt = nt_to_unix((19496896L, 30196149L)) match_dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc) assert abs(dt - match_dt) <= dt.resolution

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py

def nt_to_unix(nt_timestamp_tuple, return_datetime=True):
    """
    :param nt_timestamp_tuple: Tuple of two longs representing the NT date
    :type nt_timestamp_tuple: (long, long)

    :param return_datetime:  Return a datetime object instead of float
    :type return_datetime: bool


    Returns a datetime.datetime object w/ UTC timezone
    calculated from the nt time tuple

    lowDateTime, highDateTime = nt_timestamp_tuple

    The timestamp is a 64bit count of 100ns intervals since the NT epoch
    broken into two 32bit longs, least significant first:

    >>> dt = nt_to_unix((19496896L, 30196149L))
    >>> match_dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc)
    >>> assert abs(dt - match_dt) <= dt.resolution
    """

    lowDateTime, highDateTime = nt_timestamp_tuple
    sec_past_nt_epoch = ((highDateTime << 32) + lowDateTime) * 1.0e-7

    if return_datetime:
        return UTC_NT_EPOCH + datetime.timedelta(seconds=sec_past_nt_epoch)

    else:
        sec_past_unix_epoch = sec_past_nt_epoch - EPOCH_DELTA_SECONDS
        return sec_past_unix_epoch

`unix_to_datetime(unix_timestamp)`

:param unix_timestamp: Number of seconds since unix epoch (1/1/1970) :type unix_timestamp: float

:param tz: timezone to use for conversion (default None = UTC) :type tz: None or tzinfo object (see datetime docs)

:returns: datetime object :raises: ValueError if unix_timestamp is not of type float or datetime

Returns a datetime object from a unix timestamp. Simple wrapper for :func:datetime.datetime.fromtimestamp

from pytz import utc from datetime import datetime epoch = unix_to_datetime(0.0, tz=utc) assert epoch == datetime(1970, 1, 1, tzinfo=utc)

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py

def unix_to_datetime(unix_timestamp):
    """
    :param unix_timestamp: Number of seconds since unix epoch (1/1/1970)
    :type unix_timestamp: float

    :param tz: timezone to use for conversion (default None = UTC)
    :type tz: None or tzinfo object (see datetime docs)

    :returns: datetime object
    :raises: ValueError if unix_timestamp is not of type float or datetime

    Returns a datetime object from a unix timestamp.  Simple wrapper for
    :func:`datetime.datetime.fromtimestamp`

    >>> from pytz import utc
    >>> from datetime import datetime
    >>> epoch = unix_to_datetime(0.0, tz=utc)
    >>> assert epoch == datetime(1970, 1, 1, tzinfo=utc)
    """

    if isinstance(unix_timestamp, datetime.datetime):
        if unix_timestamp.tzinfo is None:
            unix_datetime = pytz_utc.localize(unix_timestamp)

        elif unix_timestamp.tzinfo == pytz_utc:
            unix_datetime = unix_timestamp

        else:
            unix_datetime = pytz_utc.normalize(unix_timestamp.astimezone(pytz_utc))

    elif isinstance(unix_timestamp, float):
        unix_datetime = pytz_utc.localize(datetime.datetime.fromtimestamp(unix_timestamp))

    else:
        errstr = "Looking for a timestamp of type datetime.datetime or # of sec past unix epoch.\n"
        errstr += "Supplied timestamp '%s' of type %s." % (
            str(unix_timestamp),
            type(unix_timestamp),
        )
        raise ValueError(errstr)

    return unix_datetime

`unix_to_nt(unix_timestamp)`

Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders

Simple conversion

dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc) assert (19496896L, 30196149L) == unix_to_nt(dt)

Converting back and forth between the two standards:

orig_dt = datetime.datetime.now(tz=pytz_utc) nt_tuple = unix_to_nt(orig_dt)

converting back may not yield the exact original date,

but will be within the datetime's precision

back_to_dt = nt_to_unix(nt_tuple) d_mu_seconds = abs(orig_dt - back_to_dt).microseconds mu_sec_resolution = orig_dt.resolution.microseconds assert d_mu_seconds <= mu_sec_resolution

Source code in src\aalibrary\utils\sonar_checker\ek_date_conversion.py

def unix_to_nt(unix_timestamp):
    """
    Given a date, return the 2-element tuple used for timekeeping with SIMRAD echosounders


    #Simple conversion
    >>> dt = datetime.datetime(2011, 12, 23, 20, 54, 3, 964000, pytz_utc)
    >>> assert (19496896L, 30196149L) == unix_to_nt(dt)

    #Converting back and forth between the two standards:
    >>> orig_dt = datetime.datetime.now(tz=pytz_utc)
    >>> nt_tuple = unix_to_nt(orig_dt)

    #converting back may not yield the exact original date,
    #but will be within the datetime's precision
    >>> back_to_dt = nt_to_unix(nt_tuple)
    >>> d_mu_seconds = abs(orig_dt - back_to_dt).microseconds
    >>> mu_sec_resolution = orig_dt.resolution.microseconds
    >>> assert d_mu_seconds <= mu_sec_resolution
    """

    if isinstance(unix_timestamp, datetime.datetime):
        if unix_timestamp.tzinfo is None:
            unix_datetime = pytz_utc.localize(unix_timestamp)

        elif unix_timestamp.tzinfo == pytz_utc:
            unix_datetime = unix_timestamp

        else:
            unix_datetime = pytz_utc.normalize(unix_timestamp.astimezone(pytz_utc))

    else:
        unix_datetime = unix_to_datetime(unix_timestamp)

    sec_past_nt_epoch = (unix_datetime - UTC_NT_EPOCH).total_seconds()

    onehundred_ns_intervals = int(sec_past_nt_epoch * 1e7)
    lowDateTime = onehundred_ns_intervals & 0xFFFFFFFF
    highDateTime = onehundred_ns_intervals >> 32

    return lowDateTime, highDateTime

`ek_raw_io`

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

Contains low-level functions called by ./ek_raw_parsers.py

Classes:

Name	Description
`RawSimradFile`	A low-level extension of the built in python file object allowing the reading/writing

`RawSimradFile`

Bases: BufferedReader

A low-level extension of the built in python file object allowing the reading/writing of SIMRAD RAW files on datagram by datagram basis (instead of at the byte level.)

Calls to the read method return parse datagrams as dicts.

Methods:

Name	Description
`__next__`	Returns the next datagram (synonymous with self.read(1))
`iter_dgrams`	Iterates through the file, repeatedly calling self.next() until
`peek`	Returns the header of the next datagram in the file. The file position is
`prev`	Returns the previous datagram 'behind' the current file pointer position
`read`	:param k: Number of datagrams to read
`readall`	Reads the entire file from the beginning and returns a list of datagrams.
`readline`	aliased to self.next()
`readlines`	aliased to self.read(-1)
`seek`	Performs the familiar 'seek' operation using datagram offsets
`skip`	Skips forward to the next datagram without reading the contents of the current one
`skip_back`	Skips backwards to the previous datagram without reading it's contents
`tell`	Returns the current file pointer offset by datagram number

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

class RawSimradFile(BufferedReader):
    """
    A low-level extension of the built in python file object allowing the reading/writing
    of SIMRAD RAW files on datagram by datagram basis (instead of at the byte level.)

    Calls to the read method return parse datagrams as dicts.
    """

    #: Dict object with datagram header/python class key/value pairs
    DGRAM_TYPE_KEY = {
        "RAW": parsers.SimradRawParser(),
        "CON": parsers.SimradConfigParser(),
        "TAG": parsers.SimradAnnotationParser(),
        "NME": parsers.SimradNMEAParser(),
        "BOT": parsers.SimradBottomParser(),
        "DEP": parsers.SimradDepthParser(),
        "XML": parsers.SimradXMLParser(),
        "IDX": parsers.SimradIDXParser(),
        "FIL": parsers.SimradFILParser(),
        "MRU": parsers.SimradMRUParser(),
    }

    def __init__(
        self,
        name,
        mode="rb",
        closefd=True,
        return_raw=False,
        buffer_size=1024 * 1024,
        storage_options={},
    ):
        #  9-28-18 RHT: Changed RawSimradFile to implement BufferedReader instead of
        #  io.FileIO to increase performance.

        #  create a raw file object for the buffered reader
        fmap = fsspec.get_mapper(name, **storage_options)
        if isinstance(fmap.fs, LocalFileSystem):
            fio = FileIO(name, mode=mode, closefd=closefd)
        else:
            fio = fmap.fs.open(fmap.root)

        #  initialize the superclass
        super().__init__(fio, buffer_size=buffer_size)
        self._current_dgram_offset = 0
        self._total_dgram_count = None
        self._return_raw = return_raw

    def _seek_bytes(self, bytes_, whence=0):
        """
        :param bytes_: byte offset
        :type bytes_: int

        :param whence:

        Seeks a file by bytes instead of datagrams.
        """

        super().seek(bytes_, whence)

    def _tell_bytes(self):
        """
        Returns the file pointer position in bytes.
        """

        return super().tell()

    def _read_dgram_size(self):
        """
        Attempts to read the size of the next datagram in the file.
        """

        buf = self._read_bytes(4)
        if len(buf) != 4:
            self._seek_bytes(-len(buf), SEEK_CUR)
            raise DatagramReadError(
                "Short read while getting dgram size",
                (4, len(buf)),
                file_pos=(self._tell_bytes(), self.tell()),
            )
        else:
            return struct.unpack("=l", buf)[0]  # This return value is an int object.

    def _bytes_remaining(self):
        old_pos = self._tell_bytes()
        self._seek_bytes(0, SEEK_END)
        end_pos = self._tell_bytes()
        offset = end_pos - old_pos
        self._seek_bytes(old_pos, SEEK_SET)

        return offset

    def _read_timestamp(self):
        """
        Attempts to read the datagram timestamp.
        """

        buf = self._read_bytes(8)
        if len(buf) != 8:
            self._seek_bytes(-len(buf), SEEK_CUR)
            raise DatagramReadError(
                "Short read while getting timestamp",
                (8, len(buf)),
                file_pos=(self._tell_bytes(), self.tell()),
            )

        else:
            lowDateField, highDateField = struct.unpack("=2L", buf)
            #  11/26/19 - RHT - modified to return the raw bytes
            return lowDateField, highDateField, buf

    def _read_dgram_header(self):
        """
        :returns: dgram_size, dgram_type, (low_date, high_date)

        Attempts to read the datagram header consisting of:

            long        dgram_size
            char[4]     type
            long        lowDateField
            long        highDateField
        """

        try:
            dgram_size = self._read_dgram_size()
        except Exception:
            if self.at_eof():
                raise SimradEOF()
            else:
                raise

        #  get the datagram type
        buf = self._read_bytes(4)

        if len(buf) != 4:
            if self.at_eof():
                raise SimradEOF()
            else:
                self._seek_bytes(-len(buf), SEEK_CUR)
                raise DatagramReadError(
                    "Short read while getting dgram type",
                    (4, len(buf)),
                    file_pos=(self._tell_bytes(), self.tell()),
                )
        else:
            dgram_type = buf
        dgram_type = dgram_type.decode("latin_1")

        #  11/26/19 - RHT
        #  As part of the rewrite of read to remove the reverse seeking,
        #  store the raw header bytes so we can prepend them to the raw
        #  data bytes and pass it all to the parser.
        raw_bytes = buf

        #  read the timestamp - this method was also modified to return
        #  the raw bytes
        lowDateField, highDateField, buf = self._read_timestamp()

        #  add the timestamp bytes to the raw_bytes string
        raw_bytes += buf

        return dict(
            size=dgram_size,
            type=dgram_type,
            low_date=lowDateField,
            high_date=highDateField,
            raw_bytes=raw_bytes,
        )

    def _read_bytes(self, k):
        """
        Reads raw bytes from the file
        """

        return super().read(k)

    def _read_next_dgram(self):
        """
        Attempts to read the next datagram from the file.

        Returns the datagram as a raw string
        """

        #  11/26/19 - RHT - Modified this method so it doesn't "peek"
        #  at the next datagram before reading which was inefficient.
        #  To minimize changes to the code, methods to read the header
        #  and timestamp were modified to return the raw bytes which
        #  allows us to pass them onto the parser without having to
        #  rewind and read again as was previously done.

        #  store our current location in the file
        old_file_pos = self._tell_bytes()

        #  try to read the header of the next datagram
        try:
            header = self._read_dgram_header()
        except DatagramReadError as e:
            e.message = "Short read while getting raw file datagram header"
            raise e

        #  check for invalid time data
        if (header["low_date"], header["high_date"]) == (0, 0):
            logger.warning(
                "Skipping %s datagram w/ timestamp of (0, 0) at %sL:%d",
                header["type"],
                str(self._tell_bytes()),
                self.tell(),
            )
            self.skip()
            return self._read_next_dgram()

        #  basic sanity check on size
        if header["size"] < 16:
            #  size can't be smaller than the header size
            logger.warning(
                "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
                header["size"],
                header["type"],
                str((header["low_date"], header["high_date"])),
            )

            #  see if we can find the next datagram
            self._find_next_datagram()

            #  and then return that
            return self._read_next_dgram()

        #  get the raw bytes from the header
        raw_dgram = header["raw_bytes"]

        #  and append the rest of the datagram - we subtract 12
        #  since we have already read 12 bytes: 4 for type and
        #  8 for time.
        raw_dgram += self._read_bytes(header["size"] - 12)

        #  determine the size of the payload in bytes
        bytes_read = len(raw_dgram)

        #  and make sure it checks out
        if bytes_read < header["size"]:
            logger.warning(
                "Datagram %d (@%d) shorter than expected length:  %d < %d",
                self.tell(),
                old_file_pos,
                bytes_read,
                header["size"],
            )
            self._find_next_datagram()
            return self._read_next_dgram()

        #  now read the trailing size value
        try:
            dgram_size_check = self._read_dgram_size()
        except DatagramReadError as e:
            self._seek_bytes(old_file_pos, SEEK_SET)
            e.message = "Short read while getting trailing raw file datagram size for check"
            raise e

        #  make sure they match
        if header["size"] != dgram_size_check:
            # self._seek_bytes(old_file_pos, SEEK_SET)
            logger.warning(
                "Datagram failed size check:  %d != %d @ (%d, %d)",
                header["size"],
                dgram_size_check,
                self._tell_bytes(),
                self.tell(),
            )
            logger.warning("Skipping to next datagram...")
            self._find_next_datagram()

            return self._read_next_dgram()

        #  add the header (16 bytes) and repeated size (4 bytes) to the payload
        #  bytes to get the total bytes read for this datagram.
        bytes_read = bytes_read + 20

        if self._return_raw:
            self._current_dgram_offset += 1
            return raw_dgram
        else:
            nice_dgram = self._convert_raw_datagram(raw_dgram, bytes_read)
            self._current_dgram_offset += 1
            return nice_dgram

    def _convert_raw_datagram(self, raw_datagram_string, bytes_read):
        """
        :param raw_datagram_string: bytestring containing datagram (first 4
            bytes indicate datagram type, such as 'RAW0')
        :type raw_datagram_string: str

        :param bytes_read: integer specifying the datagram size, including header
            in bytes,
        :type bytes_read: int

        Returns a formatted datagram object using the data in raw_datagram_string
        """

        #  11/26/19 - RHT - Modified this method to pass through the number of
        #  bytes read so we can bubble that up to the user.

        dgram_type = raw_datagram_string[:3].decode()
        try:
            parser = self.DGRAM_TYPE_KEY[dgram_type]
        except KeyError:
            # raise KeyError('Unknown datagram type %s,
            # valid types: %s' % (str(dgram_type),
            # str(self.DGRAM_TYPE_KEY.keys())))
            return raw_datagram_string

        nice_dgram = parser.from_string(raw_datagram_string, bytes_read)
        return nice_dgram

    def _set_total_dgram_count(self):
        """
        Skips quickly through the file counting datagrams and stores the
        resulting number in self._total_dgram_count

        :raises: ValueError if self._total_dgram_count is not None (it has been set before)
        """
        if self._total_dgram_count is not None:
            raise ValueError(
                "self._total_dgram_count has already been set.  Call .reset() first if you really want to recount"  # noqa
            )

        # Save current position for later
        old_file_pos = self._tell_bytes()
        old_dgram_offset = self.tell()

        self._current_dgram_offset = 0
        self._seek_bytes(0, SEEK_SET)

        while True:
            try:
                self.skip()
            except (DatagramReadError, SimradEOF):
                self._total_dgram_count = self.tell()
                break

        # Return to where we started
        self._seek_bytes(old_file_pos, SEEK_SET)
        self._current_dgram_offset = old_dgram_offset

    def at_eof(self):
        old_pos = self._tell_bytes()
        self._seek_bytes(0, SEEK_END)
        eof_pos = self._tell_bytes()

        # Check to see if we're at the end of file and raise EOF
        if old_pos == eof_pos:
            return True

        # Othereise, go back to where we were and re-raise the original
        # exception
        else:
            offset = old_pos - eof_pos
            self._seek_bytes(offset, SEEK_END)
            return False

    def read(self, k):
        """
        :param k: Number of datagrams to read
        :type k: int

        Reads the next k datagrams.  A list of datagrams is returned if k > 1.  The entire
        file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning
        of file if previous datagrams were read)
        """

        if k == 1:
            try:
                return self._read_next_dgram()
            except Exception:
                if self.at_eof():
                    raise SimradEOF()
                else:
                    raise

        elif k > 0:
            dgram_list = []

            for m in range(k):
                try:
                    dgram = self._read_next_dgram()
                    dgram_list.append(dgram)

                except Exception:
                    break

            return dgram_list

        elif k < 0:
            return self.readall()

    def readall(self):
        """
        Reads the entire file from the beginning and returns a list of datagrams.
        """

        self.seek(0, SEEK_SET)
        dgram_list = []

        for raw_dgram in self.iter_dgrams():
            dgram_list.append(raw_dgram)

        return dgram_list

    def _find_next_datagram(self):
        old_file_pos = self._tell_bytes()
        logger.warning("Attempting to find next valid datagram...")

        try:
            while self.peek()["type"][:3] not in list(self.DGRAM_TYPE_KEY.keys()):
                self._seek_bytes(1, 1)
        except DatagramReadError:
            logger.warning("No next datagram found. Ending reading of file.")
            raise SimradEOF()
        else:
            logger.warning("Found next datagram:  %s", self.peek())
            logger.warning("Skipped ahead %d bytes", self._tell_bytes() - old_file_pos)

    def tell(self):
        """
        Returns the current file pointer offset by datagram number
        """
        return self._current_dgram_offset

    def peek(self):
        """
        Returns the header of the next datagram in the file.  The file position is
        reset back to the original location afterwards.

        :returns: [dgram_size, dgram_type, (low_date, high_date)]
        """

        dgram_header = self._read_dgram_header()
        if dgram_header["type"].startswith("RAW0"):
            dgram_header["channel"] = struct.unpack("h", self._read_bytes(2))[0]
            self._seek_bytes(-18, SEEK_CUR)
        elif dgram_header["type"].startswith("RAW3"):
            chan_id = struct.unpack("128s", self._read_bytes(128))
            dgram_header["channel_id"] = chan_id.strip("\x00")
            self._seek_bytes(-(16 + 128), SEEK_CUR)
        else:
            self._seek_bytes(-16, SEEK_CUR)

        return dgram_header

    def __next__(self):
        """
        Returns the next datagram (synonymous with self.read(1))
        """

        return self.read(1)

    def prev(self):
        """
        Returns the previous datagram 'behind' the current file pointer position
        """

        self.skip_back()
        raw_dgram = self.read(1)
        self.skip_back()
        return raw_dgram

    def skip(self):
        """
        Skips forward to the next datagram without reading the contents of the current one
        """

        # dgram_size, dgram_type, (low_date, high_date) = self.peek()[:3]

        header = self.peek()

        if header["size"] < 16:
            logger.warning(
                "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
                header["size"],
                header["type"],
                str((header["low_date"], header["high_date"])),
            )

            self._find_next_datagram()

        else:
            self._seek_bytes(header["size"] + 4, SEEK_CUR)
            dgram_size_check = self._read_dgram_size()

            if header["size"] != dgram_size_check:
                logger.warning(
                    "Datagram failed size check:  %d != %d @ (%d, %d)",
                    header["size"],
                    dgram_size_check,
                    self._tell_bytes(),
                    self.tell(),
                )
                logger.warning("Skipping to next datagram... (in skip)")

                self._find_next_datagram()

        self._current_dgram_offset += 1

    def skip_back(self):
        """
        Skips backwards to the previous datagram without reading it's contents
        """

        old_file_pos = self._tell_bytes()

        try:
            self._seek_bytes(-4, SEEK_CUR)
        except IOError:
            raise

        dgram_size_check = self._read_dgram_size()

        # Seek to the beginning of the datagram and read as normal
        try:
            self._seek_bytes(-(8 + dgram_size_check), SEEK_CUR)
        except IOError:
            raise DatagramSizeError

        try:
            dgram_size = self._read_dgram_size()

        except DatagramSizeError:
            logger.info("Error reading the datagram")
            self._seek_bytes(old_file_pos, SEEK_SET)
            raise

        if dgram_size_check != dgram_size:
            self._seek_bytes(old_file_pos, SEEK_SET)
            raise DatagramSizeError
        else:
            self._seek_bytes(-4, SEEK_CUR)

        self._current_dgram_offset -= 1

    def iter_dgrams(self):
        """
        Iterates through the file, repeatedly calling self.next() until
        the end of file is reached
        """

        while True:
            # new_dgram = self.next()
            # yield new_dgram

            try:
                new_dgram = next(self)
            except Exception:
                logger.debug("Caught EOF?")
                raise StopIteration

            yield new_dgram

    # Unsupported members
    def readline(self):
        """
        aliased to self.next()
        """
        return next(self)

    def readlines(self):
        """
        aliased to self.read(-1)
        """
        return self.read(-1)

    def seek(self, offset, whence):
        """
        Performs the familiar 'seek' operation using datagram offsets
        instead of raw bytes.
        """

        if whence == SEEK_SET:
            if offset < 0:
                raise ValueError("Cannot seek backwards from beginning of file")
            else:
                self._seek_bytes(0, SEEK_SET)
                self._current_dgram_offset = 0
        elif whence == SEEK_END:
            if offset > 0:
                raise ValueError("Use negative offsets when seeking backward from end of file")

            # Do we need to generate the total number of datagrams w/in the file?
            try:
                self._set_total_dgram_count()
                # Throws a value error if _total_dgram_count has already been set.  We can ignore it
            except ValueError:
                pass

            self._seek_bytes(0, SEEK_END)
            self._current_dgram_offset = self._total_dgram_count

        elif whence == SEEK_CUR:
            pass
        else:
            raise ValueError(
                "Illegal value for 'whence' (%s), use 0 (beginning), 1 (current), or 2 (end)"
                % (str(whence))
            )

        if offset > 0:
            for k in range(offset):
                self.skip()
        elif offset < 0:
            for k in range(-offset):
                self.skip_back()

    def reset(self):
        self._current_dgram_offset = 0
        self._total_dgram_count = None
        self._seek_bytes(0, SEEK_SET)

`next()`

Returns the next datagram (synonymous with self.read(1))

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def __next__(self):
    """
    Returns the next datagram (synonymous with self.read(1))
    """

    return self.read(1)

`iter_dgrams()`

Iterates through the file, repeatedly calling self.next() until the end of file is reached

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def iter_dgrams(self):
    """
    Iterates through the file, repeatedly calling self.next() until
    the end of file is reached
    """

    while True:
        # new_dgram = self.next()
        # yield new_dgram

        try:
            new_dgram = next(self)
        except Exception:
            logger.debug("Caught EOF?")
            raise StopIteration

        yield new_dgram

`peek()`

Returns the header of the next datagram in the file. The file position is reset back to the original location afterwards.

:returns: [dgram_size, dgram_type, (low_date, high_date)]

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def peek(self):
    """
    Returns the header of the next datagram in the file.  The file position is
    reset back to the original location afterwards.

    :returns: [dgram_size, dgram_type, (low_date, high_date)]
    """

    dgram_header = self._read_dgram_header()
    if dgram_header["type"].startswith("RAW0"):
        dgram_header["channel"] = struct.unpack("h", self._read_bytes(2))[0]
        self._seek_bytes(-18, SEEK_CUR)
    elif dgram_header["type"].startswith("RAW3"):
        chan_id = struct.unpack("128s", self._read_bytes(128))
        dgram_header["channel_id"] = chan_id.strip("\x00")
        self._seek_bytes(-(16 + 128), SEEK_CUR)
    else:
        self._seek_bytes(-16, SEEK_CUR)

    return dgram_header

`prev()`

Returns the previous datagram 'behind' the current file pointer position

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def prev(self):
    """
    Returns the previous datagram 'behind' the current file pointer position
    """

    self.skip_back()
    raw_dgram = self.read(1)
    self.skip_back()
    return raw_dgram

`read(k)`

:param k: Number of datagrams to read :type k: int

Reads the next k datagrams. A list of datagrams is returned if k > 1. The entire file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning of file if previous datagrams were read)

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def read(self, k):
    """
    :param k: Number of datagrams to read
    :type k: int

    Reads the next k datagrams.  A list of datagrams is returned if k > 1.  The entire
    file is read from the CURRENT POSITION if k < 0. (does not necessarily read from beginning
    of file if previous datagrams were read)
    """

    if k == 1:
        try:
            return self._read_next_dgram()
        except Exception:
            if self.at_eof():
                raise SimradEOF()
            else:
                raise

    elif k > 0:
        dgram_list = []

        for m in range(k):
            try:
                dgram = self._read_next_dgram()
                dgram_list.append(dgram)

            except Exception:
                break

        return dgram_list

    elif k < 0:
        return self.readall()

`readall()`

Reads the entire file from the beginning and returns a list of datagrams.

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def readall(self):
    """
    Reads the entire file from the beginning and returns a list of datagrams.
    """

    self.seek(0, SEEK_SET)
    dgram_list = []

    for raw_dgram in self.iter_dgrams():
        dgram_list.append(raw_dgram)

    return dgram_list

`readline()`

aliased to self.next()

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def readline(self):
    """
    aliased to self.next()
    """
    return next(self)

`readlines()`

aliased to self.read(-1)

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def readlines(self):
    """
    aliased to self.read(-1)
    """
    return self.read(-1)

`seek(offset, whence)`

Performs the familiar 'seek' operation using datagram offsets instead of raw bytes.

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def seek(self, offset, whence):
    """
    Performs the familiar 'seek' operation using datagram offsets
    instead of raw bytes.
    """

    if whence == SEEK_SET:
        if offset < 0:
            raise ValueError("Cannot seek backwards from beginning of file")
        else:
            self._seek_bytes(0, SEEK_SET)
            self._current_dgram_offset = 0
    elif whence == SEEK_END:
        if offset > 0:
            raise ValueError("Use negative offsets when seeking backward from end of file")

        # Do we need to generate the total number of datagrams w/in the file?
        try:
            self._set_total_dgram_count()
            # Throws a value error if _total_dgram_count has already been set.  We can ignore it
        except ValueError:
            pass

        self._seek_bytes(0, SEEK_END)
        self._current_dgram_offset = self._total_dgram_count

    elif whence == SEEK_CUR:
        pass
    else:
        raise ValueError(
            "Illegal value for 'whence' (%s), use 0 (beginning), 1 (current), or 2 (end)"
            % (str(whence))
        )

    if offset > 0:
        for k in range(offset):
            self.skip()
    elif offset < 0:
        for k in range(-offset):
            self.skip_back()

`skip()`

Skips forward to the next datagram without reading the contents of the current one

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def skip(self):
    """
    Skips forward to the next datagram without reading the contents of the current one
    """

    # dgram_size, dgram_type, (low_date, high_date) = self.peek()[:3]

    header = self.peek()

    if header["size"] < 16:
        logger.warning(
            "Invalid datagram header: size: %d, type: %s, nt_date: %s.  dgram_size < 16",
            header["size"],
            header["type"],
            str((header["low_date"], header["high_date"])),
        )

        self._find_next_datagram()

    else:
        self._seek_bytes(header["size"] + 4, SEEK_CUR)
        dgram_size_check = self._read_dgram_size()

        if header["size"] != dgram_size_check:
            logger.warning(
                "Datagram failed size check:  %d != %d @ (%d, %d)",
                header["size"],
                dgram_size_check,
                self._tell_bytes(),
                self.tell(),
            )
            logger.warning("Skipping to next datagram... (in skip)")

            self._find_next_datagram()

    self._current_dgram_offset += 1

`skip_back()`

Skips backwards to the previous datagram without reading it's contents

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def skip_back(self):
    """
    Skips backwards to the previous datagram without reading it's contents
    """

    old_file_pos = self._tell_bytes()

    try:
        self._seek_bytes(-4, SEEK_CUR)
    except IOError:
        raise

    dgram_size_check = self._read_dgram_size()

    # Seek to the beginning of the datagram and read as normal
    try:
        self._seek_bytes(-(8 + dgram_size_check), SEEK_CUR)
    except IOError:
        raise DatagramSizeError

    try:
        dgram_size = self._read_dgram_size()

    except DatagramSizeError:
        logger.info("Error reading the datagram")
        self._seek_bytes(old_file_pos, SEEK_SET)
        raise

    if dgram_size_check != dgram_size:
        self._seek_bytes(old_file_pos, SEEK_SET)
        raise DatagramSizeError
    else:
        self._seek_bytes(-4, SEEK_CUR)

    self._current_dgram_offset -= 1

`tell()`

Returns the current file pointer offset by datagram number

Source code in src\aalibrary\utils\sonar_checker\ek_raw_io.py

def tell(self):
    """
    Returns the current file pointer offset by datagram number
    """
    return self._current_dgram_offset

`ek_raw_parsers`

Code originally developed for pyEcholab (https://github.com/CI-CMG/pyEcholab) by Rick Towler rick.towler@noaa.gov at NOAA AFSC.

The code has been modified to handle split-beam data and channel-transducer structure from different EK80 setups.

Classes:

Name	Description
`SimradAnnotationParser`	ER60 Annotation datagram contains the following keys:
`SimradBottomParser`	Bottom Detection datagram contains the following keys:
`SimradConfigParser`	Simrad Configuration Datagram parser operates on dictionaries with the following keys:
`SimradDepthParser`	ER60 Depth Detection datagram (from .bot files) contain the following keys:
`SimradNMEAParser`	ER60 NMEA datagram contains the following keys:
`SimradRawParser`	Sample Data Datagram parser operates on dictionaries with the following keys:

`SimradAnnotationParser`

Bases: _SimradDatagramParser

ER60 Annotation datagram contains the following keys:

type:         string == 'TAG0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:     datetime.datetime object of NT date, assumed to be UTC

text:         Annotation

The following methods are defined:

from_string(str):    parse a raw ER60 Annotation datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradAnnotationParser(_SimradDatagramParser):
    """
    ER60 Annotation datagram contains the following keys:


        type:         string == 'TAG0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:     datetime.datetime object of NT date, assumed to be UTC

        text:         Annotation

    The following methods are defined:

        from_string(str):    parse a raw ER60 Annotation datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")]}

        _SimradDatagramParser.__init__(self, "TAG", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        #        if version == 0:
        #            data['text'] = raw_string[self.header_size(version):].strip('\x00')
        #            if isinstance(data['text'], bytes):
        #                data['text'] = data['text'].decode()

        if version == 0:
            if sys.version_info.major > 2:
                data["text"] = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                data["text"] = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["text"][-1] != "\x00":
                tmp_string = data["text"] + "\x00"
            else:
                tmp_string = data["text"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))
            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradBottomParser`

Bases: _SimradDatagramParser

Bottom Detection datagram contains the following keys:

type:         string == 'BOT0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
datetime:     datetime.datetime object of NT date converted to UTC
transceiver_count:  long uint with number of transceivers
depth:        [float], one value for each active channel

The following methods are defined:

from_string(str):    parse a raw ER60 Bottom datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradBottomParser(_SimradDatagramParser):
    """
    Bottom Detection datagram contains the following keys:

        type:         string == 'BOT0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        datetime:     datetime.datetime object of NT date converted to UTC
        transceiver_count:  long uint with number of transceivers
        depth:        [float], one value for each active channel

    The following methods are defined:

        from_string(str):    parse a raw ER60 Bottom datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("transceiver_count", "L"),
            ]
        }
        _SimradDatagramParser.__init__(self, "BOT", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            depth_fmt = "=%dd" % (data["transceiver_count"],)
            depth_size = struct.calcsize(depth_fmt)
            buf_indx = self.header_size(version)
            data["depth"] = np.fromiter(
                struct.unpack(depth_fmt, raw_string[buf_indx : buf_indx + depth_size]),  # noqa
                "float",
            )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            if len(data["depth"]) != data["transceiver_count"]:
                logger.warning(
                    "# of depth values %d does not match transceiver count %d",
                    len(data["depth"]),
                    data["transceiver_count"],
                )

                data["transceiver_count"] = len(data["depth"])

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%dd" % (data["transceiver_count"])
            datagram_contents.extend(data["depth"])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradConfigParser`

Bases: _SimradDatagramParser

Simrad Configuration Datagram parser operates on dictionaries with the following keys:

type:         string == 'CON0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC

survey_name                     [str]
transect_name                   [str]
sounder_name                    [str]
version                         [str]
spare0                          [str]
transceiver_count               [long]
transceivers                    [list] List of dicts representing Transducer Configs:

ME70 Data contains the following additional values (data contained w/in first 14
    bytes of the spare0 field)

multiplexing                    [short]  Always 0
time_bias                       [long] difference between UTC and local time in min.
sound_velocity_avg              [float] [m/s]
sound_velocity_transducer       [float] [m/s]
beam_config                     [str] Raw XML string containing beam config. info

Transducer Config Keys (ER60/ES60/ES70 sounders): channel_id [str] channel ident string beam_type [long] Type of channel (0 = Single, 1 = Split) frequency [float] channel frequency equivalent_beam_angle [float] dB beamwidth_alongship [float] beamwidth_athwartship [float] angle_sensitivity_alongship [float] angle_sensitivity_athwartship [float] angle_offset_alongship [float] angle_offset_athwartship [float] pos_x [float] pos_y [float] pos_z [float] dir_x [float] dir_y [float] dir_z [float] pulse_length_table [float[5]] spare1 [str] gain_table [float[5]] spare2 [str] sa_correction_table [float[5]] spare3 [str] gpt_software_version [str] spare4 [str]

Transducer Config Keys (ME70 sounders): channel_id [str] channel ident string beam_type [long] Type of channel (0 = Single, 1 = Split) reserved1 [float] channel frequency equivalent_beam_angle [float] dB beamwidth_alongship [float] beamwidth_athwartship [float] angle_sensitivity_alongship [float] angle_sensitivity_athwartship [float] angle_offset_alongship [float] angle_offset_athwartship [float] pos_x [float] pos_y [float] pos_z [float] beam_steering_angle_alongship [float] beam_steering_angle_athwartship [float] beam_steering_angle_unused [float] pulse_length [float] reserved2 [float] spare1 [str] gain [float] reserved3 [float] spare2 [str] sa_correction [float] reserved4 [float] spare3 [str] gpt_software_version [str] spare4 [str]

from_string(str): parse a raw config datagram (with leading/trailing datagram size stripped)

to_string(dict): Returns raw string (including leading/trailing size fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradConfigParser(_SimradDatagramParser):
    """
    Simrad Configuration Datagram parser operates on dictionaries with the following keys:

        type:         string == 'CON0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC

        survey_name                     [str]
        transect_name                   [str]
        sounder_name                    [str]
        version                         [str]
        spare0                          [str]
        transceiver_count               [long]
        transceivers                    [list] List of dicts representing Transducer Configs:

        ME70 Data contains the following additional values (data contained w/in first 14
            bytes of the spare0 field)

        multiplexing                    [short]  Always 0
        time_bias                       [long] difference between UTC and local time in min.
        sound_velocity_avg              [float] [m/s]
        sound_velocity_transducer       [float] [m/s]
        beam_config                     [str] Raw XML string containing beam config. info


    Transducer Config Keys (ER60/ES60/ES70 sounders):
        channel_id                      [str]   channel ident string
        beam_type                       [long]  Type of channel (0 = Single, 1 = Split)
        frequency                       [float] channel frequency
        equivalent_beam_angle           [float] dB
        beamwidth_alongship             [float]
        beamwidth_athwartship           [float]
        angle_sensitivity_alongship     [float]
        angle_sensitivity_athwartship   [float]
        angle_offset_alongship          [float]
        angle_offset_athwartship        [float]
        pos_x                           [float]
        pos_y                           [float]
        pos_z                           [float]
        dir_x                           [float]
        dir_y                           [float]
        dir_z                           [float]
        pulse_length_table              [float[5]]
        spare1                          [str]
        gain_table                      [float[5]]
        spare2                          [str]
        sa_correction_table             [float[5]]
        spare3                          [str]
        gpt_software_version            [str]
        spare4                          [str]

    Transducer Config Keys (ME70 sounders):
        channel_id                      [str]   channel ident string
        beam_type                       [long]  Type of channel (0 = Single, 1 = Split)
        reserved1                       [float] channel frequency
        equivalent_beam_angle           [float] dB
        beamwidth_alongship             [float]
        beamwidth_athwartship           [float]
        angle_sensitivity_alongship     [float]
        angle_sensitivity_athwartship   [float]
        angle_offset_alongship          [float]
        angle_offset_athwartship        [float]
        pos_x                           [float]
        pos_y                           [float]
        pos_z                           [float]
        beam_steering_angle_alongship   [float]
        beam_steering_angle_athwartship [float]
        beam_steering_angle_unused      [float]
        pulse_length                    [float]
        reserved2                       [float]
        spare1                          [str]
        gain                            [float]
        reserved3                       [float]
        spare2                          [str]
        sa_correction                   [float]
        reserved4                       [float]
        spare3                          [str]
        gpt_software_version            [str]
        spare4                          [str]

    from_string(str):   parse a raw config datagram
                        (with leading/trailing datagram size stripped)

    to_string(dict):    Returns raw string (including leading/trailing size fields)
                        ready for writing to disk
    """

    COMMON_KEYS = [
        ("channel_id", "128s"),
        ("beam_type", "l"),
        ("frequency", "f"),
        ("gain", "f"),
        ("equivalent_beam_angle", "f"),
        ("beamwidth_alongship", "f"),
        ("beamwidth_athwartship", "f"),
        ("angle_sensitivity_alongship", "f"),
        ("angle_sensitivity_athwartship", "f"),
        ("angle_offset_alongship", "f"),
        ("angle_offset_athwartship", "f"),
        ("pos_x", "f"),
        ("pos_y", "f"),
        ("pos_z", "f"),
        ("dir_x", "f"),
        ("dir_y", "f"),
        ("dir_z", "f"),
        ("pulse_length_table", "5f"),
        ("spare1", "8s"),
        ("gain_table", "5f"),
        ("spare2", "8s"),
        ("sa_correction_table", "5f"),
        ("spare3", "8s"),
        ("gpt_software_version", "16s"),
        ("spare4", "28s"),
    ]

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("survey_name", "128s"),
                ("transect_name", "128s"),
                ("sounder_name", "128s"),
                ("version", "30s"),
                ("spare0", "98s"),
                ("transceiver_count", "l"),
            ],
            1: [("type", "4s"), ("low_date", "L"), ("high_date", "L")],
        }

        _SimradDatagramParser.__init__(self, "CON", headers)

        self._transducer_headers = {
            "ER60": self.COMMON_KEYS,
            "ES60": self.COMMON_KEYS,
            "ES70": self.COMMON_KEYS,
            "MBES": [
                ("channel_id", "128s"),
                ("beam_type", "l"),
                ("frequency", "f"),
                ("reserved1", "f"),
                ("equivalent_beam_angle", "f"),
                ("beamwidth_alongship", "f"),
                ("beamwidth_athwartship", "f"),
                ("angle_sensitivity_alongship", "f"),
                ("angle_sensitivity_athwartship", "f"),
                ("angle_offset_alongship", "f"),
                ("angle_offset_athwartship", "f"),
                ("pos_x", "f"),
                ("pos_y", "f"),
                ("pos_z", "f"),
                ("beam_steering_angle_alongship", "f"),
                ("beam_steering_angle_athwartship", "f"),
                ("beam_steering_angle_unused", "f"),
                ("pulse_length", "f"),
                ("reserved2", "f"),
                ("spare1", "20s"),
                ("gain", "f"),
                ("reserved3", "f"),
                ("spare2", "20s"),
                ("sa_correction", "f"),
                ("reserved4", "f"),
                ("spare3", "20s"),
                ("gpt_software_version", "16s"),
                ("spare4", "28s"),
            ],
        }

    def _unpack_contents(self, raw_string, bytes_read, version):
        data = {}
        round6 = lambda x: round(x, ndigits=6)  # noqa
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]

            #  handle Python 3 strings
            if (sys.version_info.major > 2) and isinstance(data[field], bytes):
                data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            data["transceivers"] = {}

            for field in ["transect_name", "version", "survey_name", "sounder_name"]:
                data[field] = data[field].strip("\x00")

            sounder_name = data["sounder_name"]
            if sounder_name == "MBES":
                _me70_extra_values = struct.unpack("=hLff", data["spare0"][:14])
                data["multiplexing"] = _me70_extra_values[0]
                data["time_bias"] = _me70_extra_values[1]
                data["sound_velocity_avg"] = _me70_extra_values[2]
                data["sound_velocity_transducer"] = _me70_extra_values[3]
                data["spare0"] = data["spare0"][:14] + data["spare0"][14:].strip("\x00")

            else:
                data["spare0"] = data["spare0"].strip("\x00")

            buf_indx = self.header_size(version)

            try:
                transducer_header = self._transducer_headers[sounder_name]
                _sounder_name_used = sounder_name
            except KeyError:
                logger.warning(
                    "Unknown sounder_name:  %s, (no one of %s)",
                    sounder_name,
                    list(self._transducer_headers.keys()),
                )
                logger.warning("Will use ER60 transducer config fields as default")

                transducer_header = self._transducer_headers["ER60"]
                _sounder_name_used = "ER60"

            txcvr_header_fields = [x[0] for x in transducer_header]
            txcvr_header_fmt = "=" + "".join([x[1] for x in transducer_header])
            txcvr_header_size = struct.calcsize(txcvr_header_fmt)

            for txcvr_indx in range(1, data["transceiver_count"] + 1):
                txcvr_header_values_encoded = struct.unpack(
                    txcvr_header_fmt,
                    raw_string[buf_indx : buf_indx + txcvr_header_size],  # noqa
                )
                txcvr_header_values = list(txcvr_header_values_encoded)
                for tx_idx, tx_val in enumerate(txcvr_header_values_encoded):
                    if isinstance(tx_val, bytes):
                        txcvr_header_values[tx_idx] = tx_val.decode("latin_1")

                txcvr = data["transceivers"].setdefault(txcvr_indx, {})

                if _sounder_name_used in ["ER60", "ES60", "ES70"]:
                    for txcvr_field_indx, field in enumerate(txcvr_header_fields[:17]):
                        txcvr[field] = txcvr_header_values[txcvr_field_indx]

                    txcvr["pulse_length_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[17:22])), "float"
                    )
                    txcvr["spare1"] = txcvr_header_values[22]
                    txcvr["gain_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[23:28])), "float"
                    )
                    txcvr["spare2"] = txcvr_header_values[28]
                    txcvr["sa_correction_table"] = np.fromiter(
                        list(map(round6, txcvr_header_values[29:34])), "float"
                    )
                    txcvr["spare3"] = txcvr_header_values[34]
                    txcvr["gpt_software_version"] = txcvr_header_values[35]
                    txcvr["spare4"] = txcvr_header_values[36]

                elif _sounder_name_used == "MBES":
                    for txcvr_field_indx, field in enumerate(txcvr_header_fields):
                        txcvr[field] = txcvr_header_values[txcvr_field_indx]

                else:
                    raise RuntimeError(
                        "Unknown _sounder_name_used (Should not happen, this is a bug!)"
                    )

                txcvr["channel_id"] = txcvr["channel_id"].strip("\x00")
                txcvr["spare1"] = txcvr["spare1"].strip("\x00")
                txcvr["spare2"] = txcvr["spare2"].strip("\x00")
                txcvr["spare3"] = txcvr["spare3"].strip("\x00")
                txcvr["spare4"] = txcvr["spare4"].strip("\x00")
                txcvr["gpt_software_version"] = txcvr["gpt_software_version"].strip("\x00")

                buf_indx += txcvr_header_size

        elif version == 1:
            # CON1 only has a single data field:  beam_config, holding an xml string
            data["beam_config"] = raw_string[self.header_size(version) :].strip("\x00")

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            if data["transceiver_count"] != len(data["transceivers"]):
                logger.warning("Mismatch between 'transceiver_count' and actual # of transceivers")
                data["transceiver_count"] = len(data["transceivers"])

            sounder_name = data["sounder_name"]
            if sounder_name == "MBES":
                _packed_me70_values = struct.pack(
                    "=hLff",
                    data["multiplexing"],
                    data["time_bias"],
                    data["sound_velocity_avg"],
                    data["sound_velocity_transducer"],
                )
                data["spare0"] = _packed_me70_values + data["spare0"][14:]

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            try:
                transducer_header = self._transducer_headers[sounder_name]
                _sounder_name_used = sounder_name
            except KeyError:
                logger.warning(
                    "Unknown sounder_name:  %s, (no one of %s)",
                    sounder_name,
                    list(self._transducer_headers.keys()),
                )
                logger.warning("Will use ER60 transducer config fields as default")

                transducer_header = self._transducer_headers["ER60"]
                _sounder_name_used = "ER60"

            txcvr_header_fields = [x[0] for x in transducer_header]
            txcvr_header_fmt = "=" + "".join([x[1] for x in transducer_header])
            txcvr_header_size = struct.calcsize(txcvr_header_fmt)  # noqa

            for txcvr_indx, txcvr in list(data["transceivers"].items()):
                txcvr_contents = []

                if _sounder_name_used in ["ER60", "ES60", "ES70"]:
                    for field in txcvr_header_fields[:17]:
                        txcvr_contents.append(txcvr[field])

                    txcvr_contents.extend(txcvr["pulse_length_table"])
                    txcvr_contents.append(txcvr["spare1"])

                    txcvr_contents.extend(txcvr["gain_table"])
                    txcvr_contents.append(txcvr["spare2"])

                    txcvr_contents.extend(txcvr["sa_correction_table"])
                    txcvr_contents.append(txcvr["spare3"])

                    txcvr_contents.extend([txcvr["gpt_software_version"], txcvr["spare4"]])

                    txcvr_contents_str = struct.pack(txcvr_header_fmt, *txcvr_contents)

                elif _sounder_name_used == "MBES":
                    for field in txcvr_header_fields:
                        txcvr_contents.append(txcvr[field])

                    txcvr_contents_str = struct.pack(txcvr_header_fmt, *txcvr_contents)

                else:
                    raise RuntimeError(
                        "Unknown _sounder_name_used (Should not happen, this is a bug!)"
                    )

                datagram_fmt += "%ds" % (len(txcvr_contents_str))
                datagram_contents.append(txcvr_contents_str)

        elif version == 1:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%ds" % (len(data["beam_config"]))
            datagram_contents.append(data["beam_config"])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradDepthParser`

Bases: _SimradDatagramParser

ER60 Depth Detection datagram (from .bot files) contain the following keys:

type:         string == 'DEP0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
transceiver_count:  [long uint] with number of transceivers

depth:        [float], one value for each active channel
reflectivity: [float], one value for each active channel
unused:       [float], unused value for each active channel

The following methods are defined:

from_string(str):    parse a raw ER60 Depth datagram
                     (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradDepthParser(_SimradDatagramParser):
    """
    ER60 Depth Detection datagram (from .bot files) contain the following keys:

        type:         string == 'DEP0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        transceiver_count:  [long uint] with number of transceivers

        depth:        [float], one value for each active channel
        reflectivity: [float], one value for each active channel
        unused:       [float], unused value for each active channel

    The following methods are defined:

        from_string(str):    parse a raw ER60 Depth datagram
                             (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk

    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("transceiver_count", "L"),
            ]
        }
        _SimradDatagramParser.__init__(self, "DEP", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """"""

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            data_fmt = "=3f"
            data_size = struct.calcsize(data_fmt)

            data["depth"] = np.zeros((data["transceiver_count"],))
            data["reflectivity"] = np.zeros((data["transceiver_count"],))
            data["unused"] = np.zeros((data["transceiver_count"],))

            buf_indx = self.header_size(version)
            for indx in range(data["transceiver_count"]):
                d, r, u = struct.unpack(
                    data_fmt, raw_string[buf_indx : buf_indx + data_size]  # noqa
                )
                data["depth"][indx] = d
                data["reflectivity"][indx] = r
                data["unused"][indx] = u

                buf_indx += data_size

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            lengths = [
                len(data["depth"]),
                len(data["reflectivity"]),
                len(data["unused"]),
                data["transceiver_count"],
            ]

            if len(set(lengths)) != 1:
                min_indx = min(lengths)
                logger.warning("Data lengths mismatched:  d:%d, r:%d, u:%d, t:%d", *lengths)
                logger.warning("  Using minimum value:  %d", min_indx)
                data["transceiver_count"] = min_indx

            else:
                min_indx = data["transceiver_count"]

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%df" % (3 * data["transceiver_count"])

            for indx in range(data["transceiver_count"]):
                datagram_contents.extend(
                    [
                        data["depth"][indx],
                        data["reflectivity"][indx],
                        data["unused"][indx],
                    ]
                )

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradFILParser`

Bases: _SimradDatagramParser

EK80 FIL datagram contains the following keys:

type:               string == 'FIL1'
low_date:           long uint representing LSBytes of 64bit NT date
high_date:          long uint representing MSBytes of 64bit NT date
timestamp:          datetime.datetime object of NT date, assumed to be UTC
stage:              int
channel_id:         string
n_coefficients:     int
decimation_factor:  int
coefficients:       np.complex64

The following methods are defined:

from_string(str):    parse a raw EK80 FIL datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                    (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradFILParser(_SimradDatagramParser):
    """
    EK80 FIL datagram contains the following keys:


        type:               string == 'FIL1'
        low_date:           long uint representing LSBytes of 64bit NT date
        high_date:          long uint representing MSBytes of 64bit NT date
        timestamp:          datetime.datetime object of NT date, assumed to be UTC
        stage:              int
        channel_id:         string
        n_coefficients:     int
        decimation_factor:  int
        coefficients:       np.complex64

    The following methods are defined:

        from_string(str):    parse a raw EK80 FIL datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                            (including leading/trailing size fields)
                             ready for writing to disk
    """

    def __init__(self):
        headers = {
            1: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("stage", "h"),
                ("spare", "2s"),
                ("channel_id", "128s"),
                ("n_coefficients", "h"),
                ("decimation_factor", "h"),
            ]
        }

        _SimradDatagramParser.__init__(self, "FIL", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        data = {}
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]

            #  handle Python 3 strings
            if (sys.version_info.major > 2) and isinstance(data[field], bytes):
                data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 1:
            #  clean up the channel ID
            data["channel_id"] = data["channel_id"].strip("\x00")

            #  unpack the coefficients
            indx = self.header_size(version)
            block_size = data["n_coefficients"] * 8
            data["coefficients"] = np.frombuffer(
                raw_string[indx : indx + block_size], dtype="complex64"  # noqa
            )

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            pass

        elif version == 1:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            datagram_fmt += "%ds" % (len(data["beam_config"]))
            datagram_contents.append(data["beam_config"])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradIDXParser`

Bases: _SimradDatagramParser

ER60/EK80 IDX datagram contains the following keys:

type:         string == 'IDX0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
ping_number:  int
distance :    float
latitude:     float
longitude:    float
file_offset:  int

The following methods are defined:

from_string(str):   Parse a raw ER60/EK80 IDX datagram
                    (with leading/trailing datagram size stripped)

to_string():    Returns the datagram as a raw string (including leading/trailing size
                fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradIDXParser(_SimradDatagramParser):
    """
    ER60/EK80 IDX datagram contains the following keys:


        type:         string == 'IDX0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        ping_number:  int
        distance :    float
        latitude:     float
        longitude:    float
        file_offset:  int

    The following methods are defined:

        from_string(str):   Parse a raw ER60/EK80 IDX datagram
                            (with leading/trailing datagram size stripped)

        to_string():    Returns the datagram as a raw string (including leading/trailing size
                        fields) ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                # ('dummy', 'L'),   # There are 4 extra bytes in this datagram
                ("ping_number", "L"),
                ("distance", "d"),
                ("latitude", "d"),
                ("longitude", "d"),
                ("file_offset", "L"),
            ]
        }

        _SimradDatagramParser.__init__(self, "IDX", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Unpacks the data in raw_string into dictionary containing IDX data

        :param raw_string:
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                #  first try to decode as utf-8 but fall back to latin_1 if that fails
                try:
                    data[field] = data[field].decode("utf-8")
                except:
                    data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["timestamp"] = data["timestamp"].replace(tzinfo=None)
        data["bytes_read"] = bytes_read

        return data

    def _pack_contents(self, data, version):

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:

            for field in self.header_fields(version):
                if isinstance(data[field], str):
                    data[field] = data[field].encode("latin_1")
                datagram_contents.append(data[field])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradMRUParser`

Bases: _SimradDatagramParser

EK80 MRU datagram contains the following keys:

type:         string == 'MRU0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
heave:        float
roll :        float
pitch:        float
heading:      float

Version 1 contains (from https://www3.mbari.org/products/mbsystem/formatdoc/KongsbergKmall/EMdgmFormat_RevH/html/kmBinary.html): # noqa

Status word See 1) uint32 4U Latitude deg double 8F Longitude deg double 8F Ellipsoid height m float 4F Roll deg float 4F Pitch deg float 4F Heading deg float 4F Heave m float 4F Roll rate deg/s float 4F Pitch rate deg/s float 4F Yaw rate deg/s float 4F North velocity m/s float 4F East velocity m/s float 4F Down velocity m/s float 4F Latitude error m float 4F Longitude error m float 4F Height error m float 4F Roll error deg float 4F Pitch error deg float 4F Heading error deg float 4F Heave error m float 4F North acceleration m/s2 float 4F East acceleration m/s2 float 4F Down acceleration m/s2 float 4F Delayed heave: - - - UTC seconds s uint32 4U UTC nanoseconds ns uint32 4U Delayed heave m float 4F

The following methods are defined:

from_string(str):   parse a raw EK80 MRU datagram
                    (with leading/trailing datagram size stripped)

to_string():        Returns the datagram as a raw string (including
                    leading/trailing size fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradMRUParser(_SimradDatagramParser):
    """
    EK80 MRU datagram contains the following keys:


        type:         string == 'MRU0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        heave:        float
        roll :        float
        pitch:        float
        heading:      float

    Version 1 contains (from https://www3.mbari.org/products/mbsystem/formatdoc/KongsbergKmall/EMdgmFormat_RevH/html/kmBinary.html): # noqa

    Status word See 1)  uint32  4U
    Latitude    deg double  8F
    Longitude   deg double  8F
    Ellipsoid height    m   float   4F
    Roll    deg float   4F
    Pitch   deg float   4F
    Heading deg float   4F
    Heave   m   float   4F
    Roll rate   deg/s   float   4F
    Pitch rate  deg/s   float   4F
    Yaw rate    deg/s   float   4F
    North velocity  m/s float   4F
    East velocity   m/s float   4F
    Down velocity   m/s float   4F
    Latitude error  m   float   4F
    Longitude error m   float   4F
    Height error    m   float   4F
    Roll error  deg float   4F
    Pitch error deg float   4F
    Heading error   deg float   4F
    Heave error m   float   4F
    North acceleration  m/s2    float   4F
    East acceleration   m/s2    float   4F
    Down acceleration   m/s2    float   4F
    Delayed heave:  -   -   -
    UTC seconds s   uint32  4U
    UTC nanoseconds ns  uint32  4U
    Delayed heave   m   float   4F

    The following methods are defined:

        from_string(str):   parse a raw EK80 MRU datagram
                            (with leading/trailing datagram size stripped)

        to_string():        Returns the datagram as a raw string (including
                            leading/trailing size fields) ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("heave", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("heading", "f"),
            ],
            1: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("start_id", "4s"),  # KMB#
                ("status_word", "L"),
                ("dummy", "12s"),
                ("latitude", "d"),
                ("longitude", "d"),
                ("ellipsoid_height", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("heading", "f"),
                ("heave", "f"),
                ("roll_rate", "f"),
                ("pitch_rate", "f"),
                ("yaw_rate", "f"),
                ("velocity_north", "f"),
                ("velocity_east", "f"),
                ("velocity_down", "f"),
                ("latitude_error", "f"),
                ("longitude_error", "f"),
                ("height_error", "f"),
                ("roll_error", "f"),
                ("pitch_error", "f"),
                ("heading_error", "f"),
                ("heave_error", "f"),
                ("accel_north", "f"),
                ("accel_east", "f"),
                ("accel_down", "f"),
                ("heave_delay_secs", "L"),
                ("heave_delay_usecs", "L"),
                ("heave_delay_m", "f"),
            ],
        }

        _SimradDatagramParser.__init__(self, "MRU", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Unpacks the data in raw_string into dictionary containing MRU data

        :param raw_string:
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                #  first try to decode as utf-8 but fall back to latin_1 if that fails
                try:
                    data[field] = data[field].decode("utf-8")
                except:
                    data[field] = data[field].decode("latin_1")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["timestamp"] = data["timestamp"].replace(tzinfo=None)
        data["bytes_read"] = bytes_read

        return data

    def _pack_contents(self, data, version):

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:

            for field in self.header_fields(version):
                if isinstance(data[field], str):
                    data[field] = data[field].encode("latin_1")
                datagram_contents.append(data[field])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradNMEAParser`

Bases: _SimradDatagramParser

ER60 NMEA datagram contains the following keys:

type:         string == 'NME0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:     datetime.datetime object of NT date, assumed to be UTC

nmea_string:  full (original) NMEA string

The following methods are defined:

from_string(str):    parse a raw ER60 NMEA datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradNMEAParser(_SimradDatagramParser):
    """
    ER60 NMEA datagram contains the following keys:


        type:         string == 'NME0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:     datetime.datetime object of NT date, assumed to be UTC

        nmea_string:  full (original) NMEA string

    The following methods are defined:

        from_string(str):    parse a raw ER60 NMEA datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    nmea_head_re = re.compile(r"\$[A-Za-z]{5},")  # noqa

    def __init__(self):
        headers = {
            0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")],
            1: [("type", "4s"), ("low_date", "L"), ("high_date", "L"), ("port", "32s")],
        }

        _SimradDatagramParser.__init__(self, "NME", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Parses the NMEA string provided in raw_string

        :param raw_string:  Raw NMEA string (i.e. '$GPZDA,160012.71,11,03,2004,-1,00*7D')
        :type raw_string: str

        :returns: None
        """

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        # Remove trailing \x00 from the PORT field for NME1, rest of the datagram identical to NME0
        if version == 1:
            data["port"] = data["port"].strip("\x00")

        if version == 0 or version == 1:
            if sys.version_info.major > 2:
                data["nmea_string"] = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                data["nmea_string"] = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

            if self.nmea_head_re.match(data["nmea_string"][:7]) is not None:
                data["nmea_talker"] = data["nmea_string"][1:3]
                data["nmea_type"] = data["nmea_string"][3:6]
            else:
                data["nmea_talker"] = ""
                data["nmea_type"] = "UNKNOWN"

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["nmea_string"][-1] != "\x00":
                tmp_string = data["nmea_string"] + "\x00"
            else:
                tmp_string = data["nmea_string"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))

            # Convert to python string if needed
            if isinstance(tmp_string, str):
                tmp_string = tmp_string.encode("ascii", errors="replace")

            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradRawParser`

Bases: _SimradDatagramParser

Sample Data Datagram parser operates on dictionaries with the following keys:

type:         string == 'RAW0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC

channel                         [short] Channel number
mode                            [short] 1 = Power only, 2 = Angle only 3 = Power & Angle
transducer_depth                [float]
frequency                       [float]
transmit_power                  [float]
pulse_length                    [float]
bandwidth                       [float]
sample_interval                 [float]
sound_velocity                  [float]
absorption_coefficient          [float]
heave                           [float]
roll                            [float]
pitch                           [float]
temperature                     [float]
heading                         [float]
transmit_mode                   [short] 0 = Active, 1 = Passive, 2 = Test, -1 = Unknown
spare0                          [str]
offset                          [long]
count                           [long]

power                           [numpy array] Unconverted power values (if present)
angle                           [numpy array] Unconverted angle values (if present)

from_string(str): parse a raw sample datagram (with leading/trailing datagram size stripped)

to_string(dict): Returns raw string (including leading/trailing size fields) ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradRawParser(_SimradDatagramParser):
    """
    Sample Data Datagram parser operates on dictionaries with the following keys:

        type:         string == 'RAW0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC

        channel                         [short] Channel number
        mode                            [short] 1 = Power only, 2 = Angle only 3 = Power & Angle
        transducer_depth                [float]
        frequency                       [float]
        transmit_power                  [float]
        pulse_length                    [float]
        bandwidth                       [float]
        sample_interval                 [float]
        sound_velocity                  [float]
        absorption_coefficient          [float]
        heave                           [float]
        roll                            [float]
        pitch                           [float]
        temperature                     [float]
        heading                         [float]
        transmit_mode                   [short] 0 = Active, 1 = Passive, 2 = Test, -1 = Unknown
        spare0                          [str]
        offset                          [long]
        count                           [long]

        power                           [numpy array] Unconverted power values (if present)
        angle                           [numpy array] Unconverted angle values (if present)

    from_string(str):   parse a raw sample datagram
                        (with leading/trailing datagram size stripped)

    to_string(dict):    Returns raw string (including leading/trailing size fields)
                        ready for writing to disk
    """

    def __init__(self):
        headers = {
            0: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel", "h"),
                ("mode", "h"),
                ("transducer_depth", "f"),
                ("frequency", "f"),
                ("transmit_power", "f"),
                ("pulse_length", "f"),
                ("bandwidth", "f"),
                ("sample_interval", "f"),
                ("sound_velocity", "f"),
                ("absorption_coefficient", "f"),
                ("heave", "f"),
                ("roll", "f"),
                ("pitch", "f"),
                ("temperature", "f"),
                ("heading", "f"),
                ("transmit_mode", "h"),
                ("spare0", "6s"),
                ("offset", "l"),
                ("count", "l"),
            ],
            3: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel_id", "128s"),
                ("data_type", "h"),
                ("spare", "2s"),
                ("offset", "l"),
                ("count", "l"),
            ],
            4: [
                ("type", "4s"),
                ("low_date", "L"),
                ("high_date", "L"),
                ("channel_id", "128s"),
                ("data_type", "h"),
                ("spare", "2s"),
                ("offset", "l"),
                ("count", "l"),
            ],
        }
        _SimradDatagramParser.__init__(self, "RAW", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )

        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode(encoding="unicode_escape")

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            if data["count"] > 0:
                block_size = data["count"] * 2
                indx = self.header_size(version)

                if int(data["mode"]) & 0x1:
                    data["power"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int16"  # noqa
                    )
                    indx += block_size
                else:
                    data["power"] = None

                if int(data["mode"]) & 0x2:
                    data["angle"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int8"  # noqa
                    )
                    data["angle"] = data["angle"].reshape((-1, 2))
                else:
                    data["angle"] = None

            else:
                data["power"] = np.empty((0,), dtype="int16")
                data["angle"] = np.empty((0, 2), dtype="int8")

        # RAW3 and RAW4 have the same format, only Datatype Bit 0-1 not used in RAW4
        elif version == 3 or version == 4:
            # result = 1j*Data[...,1]; result += Data[...,0]

            #  clean up the channel ID
            data["channel_id"] = data["channel_id"].strip("\x00")

            if data["count"] > 0:
                #  set the initial block size and indx value.
                block_size = data["count"] * 2
                indx = self.header_size(version)

                if data["data_type"] & 0b1:
                    data["power"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int16"  # noqa
                    )
                    indx += block_size
                else:
                    data["power"] = None

                if data["data_type"] & 0b10:
                    data["angle"] = np.frombuffer(
                        raw_string[indx : indx + block_size], dtype="int8"  # noqa
                    )
                    data["angle"] = data["angle"].reshape((-1, 2))
                    indx += block_size
                else:
                    data["angle"] = None

                #  determine the complex sample data type - this is contained in bits 2 and 3
                #  of the datatype <short> value. I'm assuming the types are exclusive...
                data["complex_dtype"] = np.float16
                type_bytes = 2
                if data["data_type"] & 0b1000:
                    data["complex_dtype"] = np.float32
                    type_bytes = 8

                #  determine the number of complex samples
                data["n_complex"] = data["data_type"] >> 8

                #  unpack the complex samples
                if data["n_complex"] > 0:
                    #  determine the block size
                    block_size = data["count"] * data["n_complex"] * type_bytes

                    data["complex"] = np.frombuffer(
                        raw_string[indx : indx + block_size],  # noqa
                        dtype=data["complex_dtype"],
                    )
                    data["complex"].dtype = np.complex64
                    if version == 3:
                        data["complex"] = data["complex"].reshape((-1, data["n_complex"]))
                else:
                    data["complex"] = None

            else:
                data["power"] = np.empty((0,), dtype="int16")
                data["angle"] = np.empty((0,), dtype="int8")
                data["complex"] = np.empty((0,), dtype="complex64")
                data["n_complex"] = 0

        return data

    def _pack_contents(self, data, version):
        datagram_fmt = self.header_fmt(version)

        datagram_contents = []

        if version == 0:
            if data["count"] > 0:
                if (int(data["mode"]) & 0x1) and (len(data.get("power", [])) != data["count"]):
                    logger.warning(
                        "Data 'count' = %d, but contains %d power samples.  Ignoring power."
                    )
                    data["mode"] &= ~(1 << 0)

                if (int(data["mode"]) & 0x2) and (len(data.get("angle", [])) != data["count"]):
                    logger.warning(
                        "Data 'count' = %d, but contains %d angle samples.  Ignoring angle."
                    )
                    data["mode"] &= ~(1 << 1)

                if data["mode"] == 0:
                    logger.warning(
                        "Data 'count' = %d, but mode == 0.  Setting count to 0",
                        data["count"],
                    )
                    data["count"] = 0

            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["count"] > 0:
                if int(data["mode"]) & 0x1:
                    datagram_fmt += "%dh" % (data["count"])
                    datagram_contents.extend(data["power"])

                if int(data["mode"]) & 0x2:
                    datagram_fmt += "%dH" % (data["count"])
                    datagram_contents.extend(data["angle"])

        return struct.pack(datagram_fmt, *datagram_contents)

`SimradXMLParser`

Bases: _SimradDatagramParser

EK80 XML datagram contains the following keys:

type:         string == 'XML0'
low_date:     long uint representing LSBytes of 64bit NT date
high_date:    long uint representing MSBytes of 64bit NT date
timestamp:    datetime.datetime object of NT date, assumed to be UTC
subtype:      string representing Simrad XML datagram type:
              configuration, environment, or parameter

[subtype]:    dict containing the data specific to the XML subtype.

The following methods are defined:

from_string(str):    parse a raw EK80 XML datagram
                    (with leading/trailing datagram size stripped)

to_string():         Returns the datagram as a raw string
                     (including leading/trailing size fields)
                     ready for writing to disk

Source code in src\aalibrary\utils\sonar_checker\ek_raw_parsers.py

class SimradXMLParser(_SimradDatagramParser):
    """
    EK80 XML datagram contains the following keys:


        type:         string == 'XML0'
        low_date:     long uint representing LSBytes of 64bit NT date
        high_date:    long uint representing MSBytes of 64bit NT date
        timestamp:    datetime.datetime object of NT date, assumed to be UTC
        subtype:      string representing Simrad XML datagram type:
                      configuration, environment, or parameter

        [subtype]:    dict containing the data specific to the XML subtype.

    The following methods are defined:

        from_string(str):    parse a raw EK80 XML datagram
                            (with leading/trailing datagram size stripped)

        to_string():         Returns the datagram as a raw string
                             (including leading/trailing size fields)
                             ready for writing to disk
    """

    #  define the XML parsing options - here we define dictionaries for various xml datagram
    #  types. When parsing that xml datagram, these dictionaries are used to inform the parser about
    #  type conversion, name wrangling, and delimiter. If a field is missing, the parser
    #  assumes no conversion: type will be string, default mangling, and that there is only 1
    #  element.
    #
    #  the dicts are in the form:
    #       'XMLParamName':[converted type,'fieldname', 'parse char']
    #
    #  For example: 'PulseDurationFM':[float,'pulse_duration_fm',';']
    #
    #  will result in a return dictionary field named 'pulse_duration_fm' that contains a list
    #  of float values parsed from a string that uses ';' to separate values. Empty strings
    #  for fieldname and/or parse char result in the default action for those parsing steps.

    channel_parsing_options = {
        "MaxTxPowerTransceiver": [int, "", ""],
        "PulseDuration": [float, "", ";"],
        "PulseDurationFM": [float, "pulse_duration_fm", ";"],
        "SampleInterval": [float, "", ";"],
        "ChannelID": [str, "channel_id", ""],
        "HWChannelConfiguration": [str, "hw_channel_configuration", ""],
    }

    transceiver_parsing_options = {
        "TransceiverNumber": [int, "", ""],
        "Version": [str, "transceiver_version", ""],
        "IPAddress": [str, "ip_address", ""],
        "Impedance": [int, "", ""],
    }

    transducer_parsing_options = {
        "SerialNumber": [str, "transducer_serial_number", ""],
        "Frequency": [float, "transducer_frequency", ""],
        "FrequencyMinimum": [float, "transducer_frequency_minimum", ""],
        "FrequencyMaximum": [float, "transducer_frequency_maximum", ""],
        "BeamType": [int, "transducer_beam_type", ""],
        "Gain": [float, "", ";"],
        "SaCorrection": [float, "", ";"],
        "MaxTxPowerTransducer": [float, "", ""],
        "EquivalentBeamAngle": [float, "", ""],
        "BeamWidthAlongship": [float, "", ""],
        "BeamWidthAthwartship": [float, "", ""],
        "AngleSensitivityAlongship": [float, "", ""],
        "AngleSensitivityAthwartship": [float, "", ""],
        "AngleOffsetAlongship": [float, "", ""],
        "AngleOffsetAthwartship": [float, "", ""],
        "DirectivityDropAt2XBeamWidth": [
            float,
            "directivity_drop_at_2x_beam_width",
            "",
        ],
        "TransducerOffsetX": [float, "", ""],
        "TransducerOffsetY": [float, "", ""],
        "TransducerOffsetZ": [float, "", ""],
        "TransducerAlphaX": [float, "", ""],
        "TransducerAlphaY": [float, "", ""],
        "TransducerAlphaZ": [float, "", ""],
    }

    header_parsing_options = {"Version": [str, "application_version", ""]}

    envxdcr_parsing_options = {"SoundSpeed": [float, "transducer_sound_speed", ""]}

    environment_parsing_options = {
        "Depth": [float, "", ""],
        "Acidity": [float, "", ""],
        "Salinity": [float, "", ""],
        "SoundSpeed": [float, "", ""],
        "Temperature": [float, "", ""],
        "Latitude": [float, "", ""],
        "SoundVelocityProfile": [float, "", ";"],
        "DropKeelOffset": [float, "", ""],
        "DropKeelOffsetIsManual": [int, "", ""],
        "WaterLevelDraft": [float, "", ""],
        "WaterLevelDraftIsManual": [int, "", ""],
    }

    parameter_parsing_options = {
        "ChannelID": [str, "channel_id", ""],
        "ChannelMode": [int, "", ""],
        "PulseForm": [int, "", ""],
        "Frequency": [float, "", ""],
        "PulseDuration": [float, "", ""],
        "SampleInterval": [float, "", ""],
        "TransmitPower": [float, "", ""],
        "Slope": [float, "", ""],
    }

    def __init__(self):
        headers = {0: [("type", "4s"), ("low_date", "L"), ("high_date", "L")]}
        _SimradDatagramParser.__init__(self, "XML", headers)

    def _unpack_contents(self, raw_string, bytes_read, version):
        """
        Parses the NMEA string provided in raw_string

        :param raw_string:  Raw NMEA string (i.e. '$GPZDA,160012.71,11,03,2004,-1,00*7D')
        :type raw_string: str

        :returns: None
        """

        def dict_to_dict(xml_dict, data_dict, parse_opts):
            """
            dict_to_dict appends the ETree xml value dicts to a provided dictionary
            and along the way converts the key name to conform to the project's
            naming convention and optionally parses and or converts values as
            specified in the parse_opts dictionary.
            """

            for k in xml_dict:
                #  check if we're parsing this key/value
                if k in parse_opts:
                    #  try to parse the string
                    if parse_opts[k][2]:
                        try:
                            data = xml_dict[k].split(parse_opts[k][2])
                        except:
                            #  bad or empty parse character(s) provided
                            data = xml_dict[k]
                    else:
                        #  no parse char provided - nothing to parse
                        data = xml_dict[k]

                    #  try to convert to specified type
                    if isinstance(data, list):
                        for i in range(len(data)):
                            try:
                                data[i] = parse_opts[k][0](data[i])
                            except:
                                pass
                    else:
                        data = parse_opts[k][0](data)

                    #  and add the value to the provided dict
                    if parse_opts[k][1]:
                        #  add using the specified key name
                        data_dict[parse_opts[k][1]] = data
                    else:
                        #  add using the default key name wrangling
                        data_dict[camelcase2snakecase(k)] = data
                else:
                    #  nothing to do with the value string
                    data = xml_dict[k]

                    #  add the parameter to the provided dictionary
                    data_dict[camelcase2snakecase(k)] = data

        header_values = struct.unpack(
            self.header_fmt(version), raw_string[: self.header_size(version)]
        )
        data = {}

        for indx, field in enumerate(self.header_fields(version)):
            data[field] = header_values[indx]
            if isinstance(data[field], bytes):
                data[field] = data[field].decode()

        data["timestamp"] = nt_to_unix((data["low_date"], data["high_date"]))
        data["bytes_read"] = bytes_read

        if version == 0:
            if sys.version_info.major > 2:
                xml_string = str(
                    raw_string[self.header_size(version) :].strip(b"\x00"),
                    "ascii",
                    errors="replace",
                )
            else:
                xml_string = unicode(  # noqa
                    raw_string[self.header_size(version) :].strip("\x00"),
                    "ascii",
                    errors="replace",
                )

            #  get the ElementTree element
            root = ET.fromstring(xml_string)

            #  get the XML message type
            data["subtype"] = root.tag.lower()

            #  create the dictionary that contains the message data
            data[data["subtype"]] = {}

            #  parse it
            if data["subtype"] == "configuration":
                #  parse the Transceiver section
                for tcvr in root.iter("Transceiver"):
                    #  parse the Transceiver section
                    tcvr_xml = tcvr.attrib

                    #  parse the Channel section -- this works with multiple channels
                    #  under 1 transceiver
                    for tcvr_ch in tcvr.iter("Channel"):
                        tcvr_ch_xml = tcvr_ch.attrib
                        channel_id = tcvr_ch_xml["ChannelID"]

                        #  create the configuration dict for this channel
                        data["configuration"][channel_id] = {}

                        #  add the transceiver data to the config dict (this is
                        #  replicated for all channels)
                        dict_to_dict(
                            tcvr_xml,
                            data["configuration"][channel_id],
                            self.transceiver_parsing_options,
                        )

                        #  add the general channel data to the config dict
                        dict_to_dict(
                            tcvr_ch_xml,
                            data["configuration"][channel_id],
                            self.channel_parsing_options,
                        )

                        #  check if there are >1 transducer under a single transceiver channel
                        if len(list(tcvr_ch)) > 1:
                            ValueError("Found >1 transducer under a single transceiver channel!")
                        else:  # should only have 1 transducer
                            tcvr_ch_xducer = tcvr_ch.find(
                                "Transducer"
                            )  # get Element of this xducer
                            f_par = tcvr_ch_xducer.findall("FrequencyPar")
                            # Save calibration parameters
                            if f_par:
                                cal_par = {
                                    "frequency": np.array(
                                        [int(f.attrib["Frequency"]) for f in f_par]
                                    ),
                                    "gain": np.array([float(f.attrib["Gain"]) for f in f_par]),
                                    "impedance": np.array(
                                        [float(f.attrib["Impedance"]) for f in f_par]
                                    ),
                                    "phase": np.array([float(f.attrib["Phase"]) for f in f_par]),
                                    "beamwidth_alongship": np.array(
                                        [float(f.attrib["BeamWidthAlongship"]) for f in f_par]
                                    ),
                                    "beamwidth_athwartship": np.array(
                                        [float(f.attrib["BeamWidthAthwartship"]) for f in f_par]
                                    ),
                                    "angle_offset_alongship": np.array(
                                        [float(f.attrib["AngleOffsetAlongship"]) for f in f_par]
                                    ),
                                    "angle_offset_athwartship": np.array(
                                        [float(f.attrib["AngleOffsetAthwartship"]) for f in f_par]
                                    ),
                                }
                                data["configuration"][channel_id]["calibration"] = cal_par
                            #  add the transducer data to the config dict
                            dict_to_dict(
                                tcvr_ch_xducer.attrib,
                                data["configuration"][channel_id],
                                self.transducer_parsing_options,
                            )

                        # get unique transceiver channel number stored in channel_id
                        tcvr_ch_num = TCVR_CH_NUM_MATCHER.search(channel_id)[0]

                        # parse the Transducers section from the root
                        # TODO Remove Transducers if doesn't exist
                        xducer = root.find("Transducers")
                        if xducer is not None:
                            # built occurrence lookup table for transducer name
                            xducer_name_list = []
                            for xducer_ch in xducer.iter("Transducer"):
                                xducer_name_list.append(xducer_ch.attrib["TransducerName"])

                            # find matching transducer for this channel_id
                            match_found = False
                            for xducer_ch in xducer.iter("Transducer"):
                                if not match_found:
                                    xducer_ch_xml = xducer_ch.attrib
                                    match_name = (
                                        xducer_ch.attrib["TransducerName"]
                                        == tcvr_ch_xducer.attrib["TransducerName"]
                                    )
                                    if xducer_ch.attrib["TransducerSerialNumber"] == "":
                                        match_sn = False
                                    else:
                                        match_sn = (
                                            xducer_ch.attrib["TransducerSerialNumber"]
                                            == tcvr_ch_xducer.attrib["SerialNumber"]
                                        )
                                    match_tcvr = (
                                        tcvr_ch_num in xducer_ch.attrib["TransducerCustomName"]
                                    )

                                    # if find match add the transducer mounting details
                                    if (
                                        Counter(xducer_name_list)[
                                            xducer_ch.attrib["TransducerName"]
                                        ]
                                        > 1
                                    ):
                                        # if more than one transducer has the same name
                                        # only check sn and transceiver unique number
                                        match_found = match_sn or match_tcvr
                                    else:
                                        match_found = match_name or match_sn or match_tcvr

                                    # add transducer mounting details
                                    if match_found:
                                        dict_to_dict(
                                            xducer_ch_xml,
                                            data["configuration"][channel_id],
                                            self.transducer_parsing_options,
                                        )

                        #  add the header data to the config dict
                        h = root.find("Header")
                        dict_to_dict(
                            h.attrib,
                            data["configuration"][channel_id],
                            self.header_parsing_options,
                        )

            elif data["subtype"] == "parameter":
                #  parse the parameter XML datagram
                for h in root.iter("Channel"):
                    parm_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(parm_xml, data["parameter"], self.parameter_parsing_options)

            elif data["subtype"] == "environment":
                #  parse the environment XML datagram
                for h in root.iter("Environment"):
                    env_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(env_xml, data["environment"], self.environment_parsing_options)

                for h in root.iter("Transducer"):
                    transducer_xml = h.attrib
                    #  add the data to the environment dict
                    dict_to_dict(
                        transducer_xml,
                        data["environment"],
                        self.envxdcr_parsing_options,
                    )

        data["xml"] = xml_string
        return data

    def _pack_contents(self, data, version):
        def to_CamelCase(xml_param):
            """
            convert name from project's convention to CamelCase for converting back to
            XML to in Kongsberg's convention.
            """
            idx = list(reversed([i for i, c in enumerate(xml_param) if c.isupper()]))
            param_len = len(xml_param)
            for i in idx:
                #  check if we should insert an underscore
                if idx > 0 and idx < param_len - 1:
                    xml_param = xml_param[:idx] + "_" + xml_param[idx:]
            xml_param = xml_param.lower()

            return xml_param

        datagram_fmt = self.header_fmt(version)
        datagram_contents = []

        if version == 0:
            for field in self.header_fields(version):
                datagram_contents.append(data[field])

            if data["nmea_string"][-1] != "\x00":
                tmp_string = data["nmea_string"] + "\x00"
            else:
                tmp_string = data["nmea_string"]

            # Pad with more nulls to 4-byte word boundary if necessary
            if len(tmp_string) % 4:
                tmp_string += "\x00" * (4 - (len(tmp_string) % 4))

            datagram_fmt += "%ds" % (len(tmp_string))

            # Convert to python string if needed
            if isinstance(tmp_string, str):
                tmp_string = tmp_string.encode("ascii", errors="replace")

            datagram_contents.append(tmp_string)

        return struct.pack(datagram_fmt, *datagram_contents)

`log`

Functions:

Name	Description
`verbose`	Set the verbosity for echopype print outs.

`verbose(logfile=None, override=False)`

Set the verbosity for echopype print outs. If called it will output logs to terminal by default.

Parameters

logfile : str, optional Optional string path to the desired log file. override: bool Boolean flag to override verbosity, which turns off verbosity if the value is False. Default is False.

Returns

None

Source code in src\aalibrary\utils\sonar_checker\log.py

def verbose(logfile: Optional[str] = None, override: bool = False) -> None:
    """Set the verbosity for echopype print outs.
    If called it will output logs to terminal by default.

    Parameters
    ----------
    logfile : str, optional
        Optional string path to the desired log file.
    override: bool
        Boolean flag to override verbosity,
        which turns off verbosity if the value is `False`.
        Default is `False`.

    Returns
    -------
    None
    """
    if not isinstance(override, bool):
        raise ValueError("override argument must be a boolean!")
    package_name = __name__.split(".")[0]  # Get the package name
    loggers = _get_all_loggers()
    verbose = True if override is False else False
    _set_verbose(verbose)
    for logger in loggers:
        if package_name in logger.name:
            handlers = [h.name for h in logger.handlers]
            if logfile is None:
                if LOGFILE_HANDLE_NAME in handlers:
                    # Remove log file handler if it exists
                    handler = next(filter(lambda h: h.name == LOGFILE_HANDLE_NAME, logger.handlers))
                    logger.removeHandler(handler)
            elif LOGFILE_HANDLE_NAME not in handlers:
                # Only add the logfile handler if it doesn't exist
                _set_logfile(logger, logfile)

            if isinstance(logfile, str):
                # Prevents multiple handler from propagating messages
                # this way there are no duplicate line in logfile
                logger.propagate = False
            else:
                logger.propagate = True

`misc`

Functions:

Name	Description
`camelcase2snakecase`	Convert string from CamelCase to snake_case
`depth_from_pressure`	Convert pressure to depth using UNESCO 1983 algorithm.

`camelcase2snakecase(camel_case_str)`

Convert string from CamelCase to snake_case e.g. CamelCase becomes camel_case.

Source code in src\aalibrary\utils\sonar_checker\misc.py

def camelcase2snakecase(camel_case_str):
    """
    Convert string from CamelCase to snake_case
    e.g. CamelCase becomes camel_case.
    """
    idx = list(reversed([i for i, c in enumerate(camel_case_str) if c.isupper()]))
    param_len = len(camel_case_str)
    for i in idx:
        #  check if we should insert an underscore
        if i > 0 and i < param_len:
            camel_case_str = camel_case_str[:i] + "_" + camel_case_str[i:]

    return camel_case_str.lower()

`depth_from_pressure(pressure, latitude=30.0, atm_pres_surf=0.0)`

Convert pressure to depth using UNESCO 1983 algorithm.

UNESCO. 1983. Algorithms for computation of fundamental properties of seawater (Pressure to Depth conversion, pages 25-27). Prepared by Fofonoff, N.P. and Millard, R.C. UNESCO technical papers in marine science, 44. http://unesdoc.unesco.org/images/0005/000598/059832eb.pdf

Parameters

pressure : Union[float, FloatSequence] Pressure in dbar latitude : Union[float, FloatSequence], default=30.0 Latitude in decimal degrees. atm_pres_surf : Union[float, FloatSequence], default=0.0 Atmospheric pressure at the surface in dbar. Use the default 0.0 value if pressure is corrected to be 0 at the surface. Otherwise, enter a correction for pressure due to air, sea ice and any other medium that may be present

Returns

depth : NDArray[float] Depth in meters

Source code in src\aalibrary\utils\sonar_checker\misc.py

def depth_from_pressure(
    pressure: Union[float, FloatSequence],
    latitude: Optional[Union[float, FloatSequence]] = 30.0,
    atm_pres_surf: Optional[Union[float, FloatSequence]] = 0.0,
) -> NDArray[float]:
    """
    Convert pressure to depth using UNESCO 1983 algorithm.

    UNESCO. 1983. Algorithms for computation of fundamental properties of seawater (Pressure to
    Depth conversion, pages 25-27). Prepared by Fofonoff, N.P. and Millard, R.C. UNESCO technical
    papers in marine science, 44. http://unesdoc.unesco.org/images/0005/000598/059832eb.pdf

    Parameters
    ----------
    pressure : Union[float, FloatSequence]
        Pressure in dbar
    latitude : Union[float, FloatSequence], default=30.0
        Latitude in decimal degrees.
    atm_pres_surf : Union[float, FloatSequence], default=0.0
        Atmospheric pressure at the surface in dbar.
        Use the default 0.0 value if pressure is corrected to be 0 at the surface.
        Otherwise, enter a correction for pressure due to air, sea ice and any other
        medium that may be present

    Returns
    -------
    depth : NDArray[float]
        Depth in meters
    """

    def _as_nparray_check(v, check_vs_pressure=False):
        """
        Convert to np.array if not already a np.array.
        Ensure latitude and atm_pres_surf are of the same size and shape as
        pressure if they are not scalar.
        """
        v_array = np.array(v) if not isinstance(v, np.ndarray) else v
        if check_vs_pressure:
            if v_array.size != 1:
                if v_array.size != pressure.size or v_array.shape != pressure.shape:
                    raise ValueError("Sequence shape or size does not match pressure")
        return v_array

    pressure = _as_nparray_check(pressure)
    latitude = _as_nparray_check(latitude, check_vs_pressure=True)
    atm_pres_surf = _as_nparray_check(atm_pres_surf, check_vs_pressure=True)

    # Constants
    g = 9.780318
    c1 = 9.72659
    c2 = -2.2512e-5
    c3 = 2.279e-10
    c4 = -1.82e-15
    k1 = 5.2788e-3
    k2 = 2.36e-5
    k3 = 1.092e-6

    # Calculate depth
    pressure = pressure - atm_pres_surf
    depth_w_g = c1 * pressure + c2 * pressure**2 + c3 * pressure**3 + c4 * pressure**4
    x = np.sin(np.deg2rad(latitude))
    gravity = g * (1.0 + k1 * x**2 + k2 * x**4) + k3 * pressure
    depth = depth_w_g / gravity
    return depth

`sonar_checker`

Functions:

Name	Description
`is_AD2CP`	Check if the provided file has a .ad2cp extension.
`is_AZFP`	Check if the specified XML file contains an with string="AZFP".
`is_AZFP6`	Check if the provided file has a .azfp extension.
`is_EK60`	Check if a raw data file is from Simrad EK60 echosounder.
`is_EK80`	Check if a raw data file is from Simrad EK80 echosounder.
`is_ER60`	Check if a raw data file is from Simrad EK60 echosounder.

`is_AD2CP(raw_file)`

Check if the provided file has a .ad2cp extension.

Parameters: raw_file (str): The name of the file to check.

Returns: bool: True if the file has a .ad2cp extension, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_AD2CP(raw_file):
    """
    Check if the provided file has a .ad2cp extension.

    Parameters:
    raw_file (str): The name of the file to check.

    Returns:
    bool: True if the file has a .ad2cp extension, False otherwise.
    """

    # Check if the input is a string
    if not isinstance(raw_file, str):
        return False  # Return False if the input is not a string

    # Use the str.lower() method to check for the .ad2cp extension
    has_ad2cp_extension = raw_file.lower().endswith(".ad2cp")

    # Return the result of the check
    return has_ad2cp_extension

`is_AZFP(raw_file)`

Check if the specified XML file contains an with string="AZFP".

Parameters: raw_file (str): The base name of the XML file (with or without extension).

Returns: bool: True if with string="AZFP" is found, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_AZFP(raw_file):
    """
    Check if the specified XML file contains an <InstrumentType> with string="AZFP".

    Parameters:
    raw_file (str): The base name of the XML file (with or without extension).

    Returns:
    bool: True if <InstrumentType> with string="AZFP" is found, False otherwise.
    """

    # Check if the filename ends with .xml or .XML, and strip the extension if it does
    base_filename = raw_file.rstrip(".xml").rstrip(".XML")

    # Create a list of possible filenames with both extensions
    possible_files = [f"{base_filename}.xml", f"{base_filename}.XML"]

    for full_filename in possible_files:
        if os.path.isfile(full_filename):
            try:
                # Parse the XML file
                tree = ET.parse(full_filename)
                root = tree.getroot()

                # Check for <InstrumentType> elements
                for instrument in root.findall(".//InstrumentType"):
                    if instrument.get("string") == "AZFP":
                        return True
            except ET.ParseError:
                print(f"Error parsing the XML file: {full_filename}.")

    return False

`is_AZFP6(raw_file)`

Check if the provided file has a .azfp extension.

Parameters: raw_file (str): The name of the file to check.

Returns: bool: True if the file has a .azfp extension, False otherwise.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_AZFP6(raw_file):
    """
    Check if the provided file has a .azfp extension.

    Parameters:
    raw_file (str): The name of the file to check.

    Returns:
    bool: True if the file has a .azfp extension, False otherwise.
    """

    # Check if the input is a string
    if not isinstance(raw_file, str):
        return False  # Return False if the input is not a string

    # Use the str.lower() method to check for the .azfp extension
    has_azfp_extension = raw_file.lower().endswith(".azfp")

    # Return the result of the check
    return has_azfp_extension

`is_EK60(raw_file, storage_options)`

Check if a raw data file is from Simrad EK60 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_EK60(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK60 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )

        try:
            # Return True if the sounder name matches "EK60"
            return config_datagram["sounder_name"] in {"ER60", "EK60"}
        except KeyError:
            return False

`is_EK80(raw_file, storage_options)`

Check if a raw data file is from Simrad EK80 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_EK80(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK80 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )

        # Return True if "configuration" exists in config_datagram
        return "configuration" in config_datagram

`is_ER60(raw_file, storage_options)`

Check if a raw data file is from Simrad EK60 echosounder.

Source code in src\aalibrary\utils\sonar_checker\sonar_checker.py

def is_ER60(raw_file, storage_options):
    """Check if a raw data file is from Simrad EK60 echosounder."""
    with RawSimradFile(raw_file, "r", storage_options=storage_options) as fid:
        config_datagram = fid.read(1)
        config_datagram["timestamp"] = np.datetime64(
            config_datagram["timestamp"].replace(tzinfo=None), "[ns]"
        )
        # Return True if the sounder name matches "ER60"
        try:
            return config_datagram["sounder_name"] in {"ER60", "EK60"}
        except KeyError:
            return False

`timings`

"This script deals with the times associated with ingesting/preprocessing data from various sources. It works as follows: * A large file (usually 1 GB) is selected to repeatedly be downloaded and uploaded to a GCP bucket. * Download and upload times are recorded for each of these n iterations. * The average of these times are presented.

Functions:

Name	Description
`time_ingestion_and_upload_from_ncei`	Used for timing the ingestion from the NCEI AWS S3 bucket.

`time_ingestion_and_upload_from_ncei(n=10, ncei_file_url='https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Reuben_Lasker/RL2107/EK80/2107RL_CW-D20210813-T220732.raw', ncei_bucket='noaa-wcsd-pds', download_location='./')`

Used for timing the ingestion from the NCEI AWS S3 bucket.

Source code in src\aalibrary\utils\timings.py

def time_ingestion_and_upload_from_ncei(
    n: int = 10,
    ncei_file_url: str = (
        "https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/"
        "Reuben_Lasker/RL2107/EK80/"
        "2107RL_CW-D20210813-T220732.raw"
    ),
    ncei_bucket: str = "noaa-wcsd-pds",
    download_location: str = "./",
):
    """Used for timing the ingestion from the NCEI AWS S3 bucket."""

    download_times = []
    upload_times = []
    file_name = helpers.get_file_name_from_url(ncei_file_url)

    for i in range(n):
        start_time = time.time()
        ncei_utils.download_single_file_from_aws(
            file_url=ncei_file_url,
            download_location=download_location,
        )
        time_elapsed = time.time() - start_time
        print(
            (
                f"Downloading took {time_elapsed} seconds."
                f"\nThat's {1000/time_elapsed} mb/sec."
            )
        )
        print("Uploading file to cloud storage")
        start_time = time.time()
        cloud_utils.upload_file_to_gcp_bucket(
            bucket=None,
            blob_file_path="timing_test_raw_upload.raw",
            local_file_path=file_name,
        )
        time_elapsed = time.time() - start_time
        print(
            (
                f"Uploading took {time_elapsed} seconds."
                f"\nThat's {1000/time_elapsed} mb/sec."
            )
        )

    print(
        (
            "Average download time for this file:"
            f" {sum(download_times)/len(download_times)}"
        )
    )
    print(
        (
            "Average upload time for this file:"
            f" {sum(upload_times)/len(upload_times)}"
        )
    )

Documentation for aalibrary.utils

cloud_utils

bq_query_to_pandas(client=None, query='')

check_existence_of_supplemental_files(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', debug=False)

check_if_file_exists_in_gcp(bucket=None, file_path='')

check_if_file_exists_in_s3(object_key='', s3_resource=None, s3_bucket_name='')

check_if_netcdf_file_exists_in_gcp(file_name='', ship_name='', survey_name='', echosounder='', data_source='', gcp_storage_bucket_location='', gcp_bucket=None, debug=False)

count_objects_in_s3_bucket_location(prefix='', bucket=None)

count_subdirectories_in_s3_bucket_location(prefix='', bucket=None)

create_s3_objs(bucket_name='noaa-wcsd-pds')

delete_file_from_gcp(gcp_bucket, blob_file_path)

download_file_from_gcp(gcp_bucket, blob_file_path, local_file_path, debug=False)

download_file_from_gcp_as_string(gcp_bucket, blob_file_path)

get_data_lake_directory_client(config_file_path='')

get_object_key_for_s3(file_url='', file_name='', ship_name='', survey_name='', echosounder='')

get_service_client_sas(account_name, sas_token)

get_subdirectories_in_s3_bucket_location(prefix='', s3_client=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')

list_all_folders_in_gcp_bucket_location(location='', gcp_bucket=None, return_full_paths=True)

list_all_objects_in_gcp_bucket_location(location='', gcp_bucket=None)

list_all_objects_in_s3_bucket_location(prefix='', s3_resource=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')

setup_gbq_client_objs(location='US', project_id='ggn-nmfs-aa-dev-1')

setup_gcp_storage_objs(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data')

upload_file_to_gcp_bucket(bucket, blob_file_path, local_file_path, debug=False)

discrepancies

compare_local_cruise_files_to_cloud(local_cruise_file_path='', ship_name='', survey_name='', echosounder='')

get_local_file_size(local_file_path)

get_local_sha256_checksum(local_file_path, chunk_size=65536)

frequency_data

FrequencyData

__init__(Sv)

construct_frequency_list()

construct_frequency_map(frequencies_provided=True)

construct_frequency_pair_combination_list()

construct_frequency_set_combination_list()

powerset(iterable)

print_frequency_list()

print_frequency_pair_combination_list()

print_frequency_set_combination_list()

main()

gcp_utils

get_all_echosounders_in_a_survey_in_storage_bucket(ship_name='', survey_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

get_all_ship_names_in_gcp_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

get_all_survey_names_from_a_ship_in_storage_bucket(ship_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

get_all_surveys_in_storage_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)

helpers

check_for_assertion_errors(**kwargs)

create_azure_config_file(download_directory='')

get_all_objects_in_survey_from_ncei(ship_name='', survey_name='', s3_bucket=None)

get_all_ship_objects_from_ncei(ship_name='', bucket=None)

get_file_name_from_url(url='')

get_file_paths_via_json_link(link='')

get_netcdf_gcp_location_from_raw_gcp_location(gcp_storage_bucket_location='')

normalize_ship_name(ship_name='')

parse_correct_gcp_storage_bucket_location(file_name='', file_type='', ship_name='', survey_name='', echosounder='', data_source='', is_metadata=False, is_survey_metadata=False, debug=False)

parse_variables_from_ncei_file_url(url='')

ices

correct_dimensions_ices(echodata, variable_name='')

echopype_ek60_raw_to_ices_netcdf(echodata, export_file)

echopype_ek80_raw_to_ices_netcdf(echodata, export_file)

ragged_data_type_ices(echodata, variable_name='')

write_ek60_beamgroup_to_netcdf(echodata, export_file)

write_ek80_beamgroup_to_netcdf(echodata, export_file)

nc_reader

get_netcdf_header(file_path)

ncei_cache_daily_script

ncei_utils

check_if_tugboat_metadata_json_exists_in_survey(ship_name='', survey_name='', s3_bucket=None)

download_single_file_from_aws(file_url='', download_location='')

download_specific_folder_from_ncei(folder_prefix='', download_directory='', debug=False)

get_all_echosounders_in_a_survey(ship_name='', survey_name='', s3_client=None, return_full_paths=False)

get_all_echosounders_that_exist_in_ncei(s3_client=None)

get_all_file_names_from_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)

get_all_file_names_in_a_surveys_echosounder_folder(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)

get_all_metadata_files_in_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)

get_all_raw_file_names_from_survey(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)

get_all_ship_names_in_ncei(normalize=False, s3_client=None, return_full_paths=False)

get_all_survey_names_from_a_ship(ship_name='', s3_client=None, return_full_paths=False)

get_all_surveys_in_ncei(s3_client=None, return_full_paths=False)

get_checksum_sha256_from_s3(object_key, s3_resource)

get_closest_ncei_formatted_ship_name(ship_name='', s3_client=None)

Documentation for `aalibrary.utils`

`cloud_utils`

`bq_query_to_pandas(client=None, query='')`

`check_existence_of_supplemental_files(file_name='', file_type='raw', ship_name='', survey_name='', echosounder='', debug=False)`

`check_if_file_exists_in_gcp(bucket=None, file_path='')`

`check_if_file_exists_in_s3(object_key='', s3_resource=None, s3_bucket_name='')`

`check_if_netcdf_file_exists_in_gcp(file_name='', ship_name='', survey_name='', echosounder='', data_source='', gcp_storage_bucket_location='', gcp_bucket=None, debug=False)`

`count_objects_in_s3_bucket_location(prefix='', bucket=None)`

`count_subdirectories_in_s3_bucket_location(prefix='', bucket=None)`

`create_s3_objs(bucket_name='noaa-wcsd-pds')`

`delete_file_from_gcp(gcp_bucket, blob_file_path)`

`download_file_from_gcp(gcp_bucket, blob_file_path, local_file_path, debug=False)`

`download_file_from_gcp_as_string(gcp_bucket, blob_file_path)`

`get_data_lake_directory_client(config_file_path='')`

`get_object_key_for_s3(file_url='', file_name='', ship_name='', survey_name='', echosounder='')`

`get_service_client_sas(account_name, sas_token)`

`get_subdirectories_in_s3_bucket_location(prefix='', s3_client=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')`

`list_all_folders_in_gcp_bucket_location(location='', gcp_bucket=None, return_full_paths=True)`

`list_all_objects_in_gcp_bucket_location(location='', gcp_bucket=None)`

`list_all_objects_in_s3_bucket_location(prefix='', s3_resource=None, return_full_paths=False, bucket_name='noaa-wcsd-pds')`

`setup_gbq_client_objs(location='US', project_id='ggn-nmfs-aa-dev-1')`

`setup_gcp_storage_objs(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data')`

`upload_file_to_gcp_bucket(bucket, blob_file_path, local_file_path, debug=False)`

`discrepancies`

`compare_local_cruise_files_to_cloud(local_cruise_file_path='', ship_name='', survey_name='', echosounder='')`

`get_local_file_size(local_file_path)`

`get_local_sha256_checksum(local_file_path, chunk_size=65536)`

`frequency_data`

`FrequencyData`

`init(Sv)`

`construct_frequency_list()`

`construct_frequency_map(frequencies_provided=True)`

`construct_frequency_pair_combination_list()`

`construct_frequency_set_combination_list()`

`powerset(iterable)`

`print_frequency_list()`

`print_frequency_pair_combination_list()`

`print_frequency_set_combination_list()`

`main()`

`gcp_utils`

`get_all_echosounders_in_a_survey_in_storage_bucket(ship_name='', survey_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

`get_all_ship_names_in_gcp_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

`get_all_survey_names_from_a_ship_in_storage_bucket(ship_name='', project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

`get_all_surveys_in_storage_bucket(project_id='ggn-nmfs-aa-dev-1', gcp_bucket_name='ggn-nmfs-aa-dev-1-data', gcp_bucket=None, return_full_paths=False)`

`helpers`

`check_for_assertion_errors(**kwargs)`

`create_azure_config_file(download_directory='')`

`get_all_objects_in_survey_from_ncei(ship_name='', survey_name='', s3_bucket=None)`

`get_all_ship_objects_from_ncei(ship_name='', bucket=None)`

`get_file_name_from_url(url='')`

`get_file_paths_via_json_link(link='')`

`get_netcdf_gcp_location_from_raw_gcp_location(gcp_storage_bucket_location='')`

`normalize_ship_name(ship_name='')`

`parse_correct_gcp_storage_bucket_location(file_name='', file_type='', ship_name='', survey_name='', echosounder='', data_source='', is_metadata=False, is_survey_metadata=False, debug=False)`

`parse_variables_from_ncei_file_url(url='')`

`ices`

`correct_dimensions_ices(echodata, variable_name='')`

`echopype_ek60_raw_to_ices_netcdf(echodata, export_file)`

`echopype_ek80_raw_to_ices_netcdf(echodata, export_file)`

`ragged_data_type_ices(echodata, variable_name='')`

`write_ek60_beamgroup_to_netcdf(echodata, export_file)`

`write_ek80_beamgroup_to_netcdf(echodata, export_file)`

`nc_reader`

`get_netcdf_header(file_path)`

`ncei_cache_daily_script`

`ncei_utils`

`check_if_tugboat_metadata_json_exists_in_survey(ship_name='', survey_name='', s3_bucket=None)`

`download_single_file_from_aws(file_url='', download_location='')`

`download_specific_folder_from_ncei(folder_prefix='', download_directory='', debug=False)`

`get_all_echosounders_in_a_survey(ship_name='', survey_name='', s3_client=None, return_full_paths=False)`

`get_all_echosounders_that_exist_in_ncei(s3_client=None)`

`get_all_file_names_from_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)`

`get_all_file_names_in_a_surveys_echosounder_folder(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)`

`get_all_metadata_files_in_survey(ship_name='', survey_name='', s3_resource=None, return_full_paths=False)`

`get_all_raw_file_names_from_survey(ship_name='', survey_name='', echosounder='', s3_resource=None, return_full_paths=False)`

`get_all_ship_names_in_ncei(normalize=False, s3_client=None, return_full_paths=False)`

`get_all_survey_names_from_a_ship(ship_name='', s3_client=None, return_full_paths=False)`

`get_all_surveys_in_ncei(s3_client=None, return_full_paths=False)`

`get_checksum_sha256_from_s3(object_key, s3_resource)`

`get_closest_ncei_formatted_ship_name(ship_name='', s3_client=None)`