Chapter 8 Get Data
8.1 Environmental Data
8.1.1 ECMWF ERA5 Data
Go back to fan’s Python Code Examples Repository (bookdown site) or the pyfan Package (API).
8.1.1.1 Basic Conda Setup
Download Anaconda for Python 3. For more involved conda instructions see here
Open up anaconda prompt with admin rights (right click choose as admin).
# Inside anaconda prompt where python where anaconda # C:/ProgramData/Anaconda3/Scripts/anaconda.exe # C:/ProgramData/Anaconda3/python.exe
Add to Path
Install cdsapi and eccodes
conda config --add channels conda-forge conda install -c anaconda pandas conda install -c conda-forge eccodes -y conda install -c conda-forge cfgrib -y conda install -c anaconda xarray
8.1.1.2 Account Registration
Register for an account
Go to your CDS user page copy the url and key: Get url and key
- this has UID, 4XXXX, and API KEY, 4XXXfXXX-XXXf-4XXX-9XXX-7XXXebXXfdXX
- together they should look like: 4XXXX:4XXXfXXX-XXXf-4XXX-9XXX-7XXXebXXfdXX
Open up an editor (notepad++ for example), create an empty file, paste the url and your UID:APIKEY into the file as below. Save file as: C:/Users/fan/.cdsapirc. Under user root, as .cdsapirc file. Note .cdsapirc is the file name, you are saving that under the directory C:/Users/fan/.
url: https://cds.climate.copernicus.eu/api/v2 key: 4XXXX:4XXXfXXX-XXXf-4XXX-9XXX-7XXXebXXfdXX
8.1.1.3 Run API Request via Jupyter Notebook
open up Jupyter Notebook (this opens up a browser page)
- cd “C:/Users/fan/Downloads”
- jupyter notebook
create a new python3 file somewhere you like
name the file cdstest (saved as ipynb file)
paste the code below inside the ipynb file you opened (modify spt_root):
import cdsapi import urllib.request # download folder = "C:/Users/fan/downloads/_data/" spt_root = spt_root + "test_china_temp.grib" spn_dl_test_grib # request = cdsapi.Client() c = c.retrieve("reanalysis-era5-pressure-levels", res {'product_type': 'reanalysis', 'variable': 'temperature', 'pressure_level': '1000', 'year': '2008', 'month': '01', 'day': '01', 'time': '12:00', 'format': 'netcdf', 'area' : [53.31, 73, 4.15, 135], 'grid' : [1.0, 1.0], "format": "grib" }, spn_dl_test_grib )# show results print('print results') print(res) print(type(res))
click run
8.1.1.4 Run API request via Ipython
- In Anaconda Prompt: ipython
- Open a file in notepad++ or elsewhere, copy the code above over and edit the spt_root to reflect your directories
- Select the entire code in the notepad++ page, and copy all lines
- Now inside ipython, type percentage and paste: %paste
- This should run the file above and save the grib file in the folder you specified with the name you specified.
8.1.1.5 Convert CRIB data to CSV
inside conda prompt cd into the folder where you downloaded the grib file
grib_ls shows what is in the grib file
grib_get_data translates grib to csv
cd "C:/Users/fan/downloads/_data/" grib_ls test_china_temp.grib grib_get_data test_china_temp.grib > test_china_temp_raw.csv
8.1.1.6 More Advanced Download Setup and Instructions
8.1.1.6.1 Conda Enviornment and Installation
In conda, set up a conda environment for downloading ECMWF data using the ECMWF API. (Conda Set-up)
# Set up
conda deactivate
conda list env
conda env remove -n wk_ecmwf
conda create -n wk_ecmwf -y
conda activate wk_ecmwf
# Add conda-forge to channel in env
conda config --env --add channels conda-forge
conda config --get channels
conda config --get channels --env
# Install
conda install cdsapi -y
conda install -c anaconda pandas
conda install -c conda-forge eccodes -y
conda install -c conda-forge cfgrib -y
conda install -c anaconda xarray
This creates the conda env that we are using here for python.
8.1.1.6.2 Config File .cdsapirc
Open up the cdsapirc, create new if does note exist. Below, open up the file and save the text. See Python Reading and Writing to File Examples.
First, get the text for the config file:
= """\
stf_cds_cdsapirc url: https://cds.climate.copernicus.eu/api/v2
key: 4XXXX:4XXXfXXX-XXXf-4XXX-9XXX-7XXXebXXfdXX\
"""
print(stf_cds_cdsapirc)
Second save text to file:
# Relative file name
= "C:/Users/fan/"
spt_file_cds = ".cdsapirc"
snm_file_cds = spt_file_cds + snm_file_cds
spn_file_cds # Open new file
= open(spn_file_cds, 'w')
fl_cdsapirc_contents # Write to File
fl_cdsapirc_contents.write(stf_cds_cdsapirc)# Close
fl_cdsapirc_contents.close()
# Open the config file to check
code "C:/Users/fan/.cdsapirc"
8.1.1.7 Generate API Requests
Go to the sites below, choose download data, pick what is needed, and then select Show API request at the bottom of page:
ERA5 pressure levels from 1979 to present
ERA5 single levels from 1979 to present
8.1.1.7.1 API Request China Temp Test
API function is here.
Select based on China’s area, some testing data and download grib file. The data is from 2008, Jan 1st, at 12 noon?
Open up Jupyter notebook: jupyter notebook
# import module in conda env wk_ecmwf
import cdsapi
import urllib.request
# download folder
= "C:/Users/fan/py4econ/vig/getdata/envir/"
spt_root = spt_root + "_data/test/test_china_temp.grib"
spn_dl_test_grib
# request
= cdsapi.Client()
c = c.retrieve("reanalysis-era5-pressure-levels",
res
{'product_type': 'reanalysis',
'variable': 'temperature',
'pressure_level': '1000',
'year': '2008',
'month': '01',
'day': '01',
'time': '12:00',
'format': 'netcdf',
'area' : [53.31, 73, 4.15, 135],
'grid' : [1.0, 1.0],
"format": "grib"
},
spn_dl_test_grib
)# show results
print('print results')
print(res)
print(type(res))
# download
# response = urllib.request.urlopen('http://www.example.com/')
# html = response.read()
2020-06-17 23:51:35,107 INFO Welcome to the CDS 2020-06-17 23:51:35,107 INFO Sending request to https://cds.climate.copernicus.eu/api/v2/resources/reanalysis-era5-pressure-levels 2020-06-17 23:51:36,441 INFO Request is queued 2020-06-17 23:51:39,183 INFO Request is running 2020-06-17 23:51:45,059 INFO Request is completed 2020-06-17 23:51:45,060 INFO Downloading > http://136.156.133.25/cache-compute-0008/cache/data2/adaptor.mars.internal-1592455900.8655114-11162-11-68e1ea23-8985-4926-95e6-9f181cc7792> 7.grib to C:/Users/fan/pyfan/vig/getdata/envir/_data/test/test_china_temp.grib (6.3K) 2020-06-17 23:51:45,441 INFO Download rate 16.6K/s print results Result(content_length=6480,content_type=application/x-grib,location=http://136.156.133.25/cache-compute-0008/cache/data2/adaptor.mars.inte> rnal-1592455900.8655114-11162-11-68e1ea23-8985-4926-95e6-9f181cc77927.grib) <class ‘cdsapi.api.Result’>
Convert grib to raw csv, open up command line:
cd "C:/Users/fan/pyfan/vig/getdata/envir/_data/test/"
grib_ls test_china_temp.grib
grib_get_data test_china_temp.grib > test_china_temp_raw.csv
Format the CSV file (is not comma separated)
= "C:/Users/fan/pyfan/vig/getdata/envir/_data/test/"
spt_root = spt_root + "test_china_temp_raw.csv"
spn_csv_raw = spt_root + "test_china_temp.csv"
spn_csv_edi
with open(spn_csv_raw, 'r') as f_in, open(spn_csv_edi, 'w') as f_out:
next(f_in))
f_out.write(','.join(line.split()) + '\n') for line in f_in] [f_out.write(
Show CSV results:
# Path and Read
= "C:/Users/fan/pyfan/vig/getdata/envir/"
spt_root = paste0(spt_root, "_data/test/test_china_temp.csv")
spn_dl_test_csv <- read.csv(spn_dl_test_csv)
china_weather_data
# Top 50 rows
kable(head(china_weather_data, 50),
caption="Chinese Long and Lat, Temperature Pressure, 2008 Jan 1st at 12 noon?") %>%
kable_styling_fc()
time | latitude | longitude | u10 | v10 | d2m | t2m | msl | sp |
---|---|---|---|---|---|---|---|---|
2008-01-01 02:00:00 | 23.25 | 113 | -2.6031342 | -4.829605 | 265.4314 | 284.8363 | 102918.8 | 102616.0 |
2008-01-01 02:00:00 | 23.25 | 114 | -2.7173920 | -3.808121 | 262.7693 | 284.2719 | 102862.1 | 101628.0 |
2008-01-01 02:00:00 | 22.25 | 113 | -2.6246185 | -6.311050 | 266.9294 | 284.2602 | 102796.6 | 102059.0 |
2008-01-01 02:00:00 | 22.25 | 114 | -2.6285248 | -6.152847 | 267.4978 | 285.3168 | 102710.1 | 102201.0 |
2008-01-01 12:00:00 | 23.25 | 113 | -1.1495056 | -2.728592 | 265.8091 | 286.1729 | 102588.9 | 102290.7 |
2008-01-01 12:00:00 | 23.25 | 114 | -1.4454040 | -2.477615 | 265.3033 | 285.0987 | 102591.6 | 101374.7 |
2008-01-01 12:00:00 | 22.25 | 113 | -0.6924744 | -4.270584 | 268.0396 | 286.5753 | 102482.9 | 101757.7 |
2008-01-01 12:00:00 | 22.25 | 114 | -1.9668884 | -4.906326 | 266.7486 | 288.0030 | 102440.1 | 101940.7 |
“ERA5 is a comprehensive reanalysis, from 1979 (soon to be backdated to 1950) to near real time, which assimilates as many observations as possible in the upper air and near surface. The ERA5 atmospheric model is coupled with a land surface model and a wave model.”
- Register for an account
- Agree to Licence
8.1.1.8 Learning
8.1.1.8.2 Single Level Parameters
- Table 1: surface and single level parameters: invariants
- Table 9: pressure level parameters: instantaneous
- Temperature
8.1.1.9 UTCI, NC Format Data, Download, Unzip, Convert to combined CSV
The data downloaded from CDS climate could become very large in size. We want to process parts of the data one part at a time, summarize and aggregate over each part, and generate a file output file with aggregate statistics over the entire time period of interest.
This code below accompalishes the following tasks:
- download data from derived-utci-historical as ZIP: API request by itself
- unzip
- convert nc files to csv files
- individual csv files are half year groups
Parameter Control for the code below:
- spt_root: root folder where everything will be at
- spth_conda_env: the conda virtual environment python path, eccodes and cdsapi packages are installed in the conda virtual environment. In the example below, the first env is: wk_ecmwf
- st_nc_prefix: the downloaded individual nc files have dates and prefix before and after the date string in the nc file names. This is the string before that.
- st_nc_suffix: see (3), this is the suffix
- ar_years: array of years to download and aggregate over
- ar_months_g1: months to download in first half year
- ar_months_g2: months to download in second half year
#################################################
# ------------ Parameters
#################################################
# Where to store everything
<- "C:/Users/fan/Downloads/_data/"
spt_root <- "C:/ProgramData/Anaconda3/envs/wk_ecmwf/python.exe"
spth_conda_env # nc name prefix
<- "ECMWF_utci_"
st_nc_prefix <- "_v1.0_con.nc"
st_nc_suffix # Years list
# ar_years <- 2001:2019
<- c(2005, 2015)
ar_years # ar_months_g1 <- c('01','02','03','04','05','06')
<- c('01', '03')
ar_months_g1 # ar_months_g2 <- c('07','08','09','10','11','12')
<- c('07', '09')
ar_months_g2
# folder to download any nc zips to
<- spt_root
nczippath # we are changing the python api file with different requests stirngs and storing it here
<- spt_root
pyapipath # output directory for AGGREGATE CSV with all DATES from this search
<- spt_root
csvpath
#################################################
# ------------ Packages
#################################################
library("ncdf4")
library("chron")
library("lattice")
library("RColorBrewer")
library("stringr")
library("tibble")
library("dplyr")
Sys.setenv(RETICULATE_PYTHON = spth_conda_env)
library("reticulate")
#################################################
# ------------ Define Loops
#################################################
for (it_yr in ar_years) {
for (it_mth_group in c(1,2)) {
if(it_mth_group == 1) {
= ar_months_g1
ar_months
}if(it_mth_group == 2) {
= ar_months_g2
ar_months
}
#################################################
# ------------ Define Python API Call
#################################################
# name of zip file
<- "derived_utci_2010_2.zip"
nczipname <- "derived_utci_2010_2"
unzipfolder
<- paste0("import cdsapi
st_file import urllib.request
# download folder
spt_root = '", nczippath, "'
spn_dl_test_grib = spt_root + '", nczipname, "'
# request
c = cdsapi.Client()
res = c.retrieve(
'derived-utci-historical',
{
'format': 'zip',
'variable': 'Universal thermal climate index',
'product_type': 'Consolidated dataset',
'year': '",it_yr, "',
'month': [
", paste("'", ar_months, "'", sep = "", collapse = ", "), "
],
'day': [
'01','03'
],
'area' : [53.31, 73, 4.15, 135],
'grid' : [0.25, 0.25],
},
spn_dl_test_grib)
# show results
print('print results')
print(res)
print(type(res))")
# st_file = "print(1+1)"
# Store Python Api File
<- paste0(pyapipath, "api.py")
fl_test_tex <- file(fl_test_tex)
fileConn writeLines(st_file, fileConn)
close(fileConn)
#################################################
# ------------ Run Python File
#################################################
# Set Path
setwd(pyapipath)
# Run py file, api.py name just defined
use_python(spth_conda_env)
source_python('api.py')
#################################################
# ------------ uNZIP
#################################################
<- paste0(nczippath, nczipname)
spn_zip <- paste0(nczippath, unzipfolder)
spn_unzip_folder unzip(spn_zip, exdir=spn_unzip_folder)
#################################################
# ------------ Find All files
#################################################
# Get all files with nc suffix in folder
<- paste0(nczippath, unzipfolder)
ncpath <- list.files(path=ncpath, recursive=TRUE, pattern=".nc", full.names=T)
ls_sfls
#################################################
# ------------ Combine individual NC files to JOINT Dataframe
#################################################
# List to gather dataframes
<- vector(mode = "list", length = length(ls_sfls))
ls_df # Loop over files and convert nc to csv
<- 0
it_df_ctr for (spt_file in ls_sfls) {
<- it_df_ctr + 1
it_df_ctr
# Get file name without Path
<- sub(paste0('\\',st_nc_suffix,'$'), '', basename(spt_file))
snm_file_date <- sub(st_nc_prefix, '', basename(snm_file_date))
snm_file_date
# Dates Start and End: list.files is auto sorted in ascending order
if (it_df_ctr == 1) {
<- snm_file_date
snm_start_date
}else {
# this will give the final date
<- snm_file_date
snm_end_date
}
# Given this structure: ECMWF_utci_20100702_v1.0_con, sub out prefix and suffix
print(spt_file)
<- nc_open(spt_file)
ncin
<- ncatt_get(ncin, 0, "history")
nchist
# not using this missing value flag at the moment
<- str_match(nchist$value, "setmisstoc,\\s*(.*?)\\s* ")[,2]
missingval <- as.numeric(missingval)
missingval
<- ncvar_get(ncin, "lon")
lon <- ncvar_get(ncin, "lat")
lat <- ncvar_get(ncin, "time")
tim <- ncatt_get(ncin, "time", "units")
tunits
<- dim(lon)
nlon <- dim(lat)
nlat <- dim(tim)
ntim
# convert time -- split the time units string into fields
# tustr <- strsplit(tunits$value, " ")
# tdstr <- strsplit(unlist(tustr)[3], "-")
# tmonth <- as.integer(unlist(tdstr)[2])
# tday <- as.integer(unlist(tdstr)[3])
# tyear <- as.integer(unlist(tdstr)[1])
# mytim <- chron(tim, origin = c(tmonth, tday, tyear))
<- ncvar_get(ncin, "utci")
tmp_array <- tmp_array - 273.15
tmp_array
<- as.matrix(expand.grid(lon = lon, lat = lat, hours = tim))
lonlat <- as.vector(tmp_array)
temperature <- data.frame(cbind(lonlat, temperature))
tmp_df
# extract a rectangle
<- 1e-8
eps <- 22.25 - eps
minlat <- 23.50 + eps
maxlat <- 113.00 - eps
minlon <- 114.50 + eps
maxlon # subset data
<- tmp_df[tmp_df$lat >= minlat & tmp_df$lat <= maxlat &
subset_df $lon >= minlon & tmp_df$lon <= maxlon, ]
tmp_df
# add Date
<- as_tibble(subset_df) %>% mutate(date = snm_file_date)
subset_df_date
# Add to list
<- subset_df_date
ls_df[[it_df_ctr]]
# Close NC
nc_close(ncin)
}
# List of DF to one DF
<- do.call(rbind, ls_df)
df_all_nc
# Save File
<- paste0(paste0(st_nc_prefix,
fname "_to_", snm_end_date,
snm_start_date, ".csv"))
<- paste0(csvpath, fname)
csvfile write.table(na.omit(df_all_nc), csvfile, row.names = FALSE, sep = ",")
# Delete folders
unlink(spn_zip, recursive=TRUE, force=TRUE)
unlink(spn_unzip_folder, recursive=TRUE, force=TRUE)
# end loop months groups
}# end loop year
}