Downloads
Here are some examples of download/ingestion functions that you can use in this library.
Downloading A Raw File From NCEI
In order to download a raw file from NCEI, use the following example:
from aalibrary.ingestion import download_raw_file_from_ncei
# This function takes care of downloading, converting, and uploading (caching) the netcdf file in gcp.
download_raw_file_from_ncei(file_name="2107RL_CW-D20210813-T220732.raw",
file_type="raw",
ship_name="Reuben_Lasker",
survey_name="RL2107",
echosounder="EK80",
data_source="NCEI",
file_download_directory=".",
upload_to_gcp=True, # Set to True if you want to upload the raw file to gcp
debug=False)
If you would like to just download a raw file, but do not care about it's source, you can use the following function:
from aalibrary.ingestion import download_raw_file
download_raw_file(file_name="2107RL_CW-D20210813-T220732.raw",
file_type="raw",
ship_name="Reuben_Lasker",
survey_name="RL2107",
echosounder="EK80",
data_source="NCEI",
file_download_directory=".",
debug=False)
Downloading An Entire Survey From NCEI
Sometimes, you will need to download an entire survey from NCEI for analysis. This is possible using the AALibrary. Follow the code snippet below.
NOTE
This function will automatically create the appropriate subdirectories within the download_directory param that you have specified. For example: within the snippet below, the data will exist in ./test_data_dir/Reuben_Lasker/RL2107/...
from aalibrary.ingestion import download_survey_from_ncei
download_survey_from_ncei(ship_name="Reuben_Lasker",
survey_name="RL2107",
download_directory="./test_data_dir",
debug: bool = False)
Downloading A Raw File From Azure Data Lake (OMAO)
Use the following code if you would like to download a file from the Azure Data Lake. The code requires a config.ini file.
NOTE
This file needs to have a [DEFAULT] section with a azure_connection_string variable set. See Azure Configuration for more information.
from aalibrary.ingestion import download_raw_file_from_azure
download_raw_file_from_azure(
file_name="1601RL-D20160107-T074016.raw",
file_type="raw",
ship_name="Reuben_Lasker",
survey_name="RL1601",
echosounder="EK60",
data_source="OMAO",
file_download_directory=".",
config_file_path="./azure_config.ini",
upload_to_gcp=True,
debug=True,
)
If you would like a single file downloaded using a path, you can use the following much more simple code:
from aalibrary.ingestion import download_specific_file_from_azure
download_specific_file_from_azure(
config_file_path="./azure_config.ini",
container_name="testcontainer",
file_path_in_container="RL2107_EK80_WCSD_EK80-metadata.json",
)
NOTE
Please keep in mind that this method creates a connection every single time you call it. This might lead to long run-times.
Downloading A Netcdf
Netcdf files (converted over from raw) only exist in the GCP cache as of now. The following example takes care of downloading a particular raw file as netcdf4 (if it had already been converted and cached in GCP, otherwise an error message is thrown):
from aalibrary import utils
from aalibrary.ingestion import download_netcdf_file
# Create a GCP bucket object
gcp_stor_client, gcp_bucket_name, gcp_bucket = utils.cloud_utils.setup_gcp_storage_objs()
# This function takes care of downloading the netcdf.
download_netcdf_file(
raw_file_name="2107RL_CW-D20210813-T220732.raw",
file_type="netcdf",
ship_name="Reuben_Lasker",
survey_name="RL2107",
echosounder="EK80",
data_source="NCEI",
file_download_directory=".",
gcp_bucket=gcp_bucket,
debug=False)
Downloading Multiple Files From A Survey
from aalibrary.ingestion import download_raw_file_from_ncei
file_names = ["2107RL_CW-D20210813-T220732.raw",
"2107RL_CW-D20210706-T172335.raw"]
for file_name in file_names:
download_raw_file_from_ncei(
file_name=file_name,
file_type="raw",
ship_name="Reuben_Lasker",
survey_name="RL2107",
echosounder="EK80",
data_source="NCEI",
file_download_directory=".",
upload_to_gcp=True, # Set to True if you want to upload the raw file to gcp
debug=False)
Downloading A File Directly From GCP
While this is a much more detailed function, AALibrary provides users the functionality to directly download a file from GCP if they wish. This can include any file within the GCP bucket. See example below.
NOTE
This procedure bypasses all of the parameter verification & validations provided by AALibrary. Use with caution.
from aalibrary.utils.cloud_utils import download_file_from_gcp, setup_gcp_storage_objs
from aalibrary.utils.helpers import parse_correct_gcp_storage_bucket_location
from aalibrary import utils
# Create a GCP bucket object
gcp_stor_client, gcp_bucket_name, gcp_bucket = setup_gcp_storage_objs()
# Get the GCP Storage bucket location for the file.
gcp_storage_bucket_location = parse_correct_gcp_storage_bucket_location(
file_name="2107RL_CW-D20210707-T103342.raw",
file_type="raw",
ship_name="Reuben_Lasker",
survey_name="RL2107",
echosounder="EK80",
data_source="NCEI",
is_metadata=False,
debug=False,
)
# This function takes care of downloading the file.
download_file_from_gcp(
gcp_bucket=gcp_bucket,
blob_file_path=gcp_storage_bucket_location,
local_file_path="./",
debug=False,
)