New Dataset Naming Convention for ISCCP (2001)
COMMENTS: The old dataset naming convention concerned data TAPES of a nearly universal form and their manipulation with mainframe operating systems. The new convention concerns data FILES for ftp over the Internet, their storage on disc drives and a wide variety of media, and their manipulation with UNIX workstations. There will be no duplicate file names: that is, if two files are found to have the same name, they will be identical in content. The proposed new naming convention assumes groupings of data files into directories at monthly time steps and by satellite where appropriate (there is one very small dataset for which the directory will cover a whole years). The proposed new naming convention tries to preserve as much of the old convention as possible and tries to be as informative as possible. To that end, each transmission of a directory will be accompanied by a number of ancillary meta-data files (many of which are fixed) that, together with the data files, provide a set of files that resembles the structure of the old data tapes. In addition, each ftp transmission session will include a LOG file that gives a one-line summary of each data file sent: file name, file size (number of bytes), checksum and md5 output. This file will be used to verify proper transmission of the data and then discarded. AC and BC data will not use this naming convention since an Internet form for these data already exists and does not need to be changed. Stage A data is not included in this convention and is assumed to remain in the original format with the original labeling for each data center. The C-series cloud products will not be modified to change to this new naming convention.
FILE Name = ISCCP.TTTTTT.V.SATIDNN.YYYY.MM.DD.HHMM.DCN
(maximum length = 44 characters), where TTTTTT, SATIDNN and DC are of variable length and all other fields are fixed length and the number of fields is fixed at 9.
- TTTTTT = Data Type = B1, B2, XXOA, B3, BT, XXTOC, B3LWMAP, SN, SI, TOVS, IS, TVX, DX, D1, D2, XXGRID, XXREAD, XXREADME, XXOA = Orbit & Attitude files for each type of data where needed (XX = B1, B2) XXTOC = OPTIONAL Table of Contents (second file in directory) summarizing image data information for each type of data (currently XX = B1, B3, BT, DX, D1, D2, TV, IS)
- B3LWMAP = fixed Land/Water Map information accompanying B3 data
- SN = Snow cover data from NOAA, SI = Sea Ice cover data from NSIDC, IS = ISCCP merged snow/sea ice dataset
- TOVS = TOVS, RTOB or ATOB = TOVS data from NOAA, TVX = ISCCP version of TOVS (each monthly TV directory will contain a climatology meta-data file, TVC, the monthly file = TVM, and the daily files = TVD
- XXGRID = fixed ancillary file giving map Grid information (XX = IS, TV, D1, D2)
- XXREAD = READ-programs for each type of ISCCP data product (XX = B3, BT, IS, TV, DX, D1, D2); date field indicates the date of last change
- XXREADME = fixed meta-data file (first file in directory), adapted from the original tape header files for ISCCP data products (XX = B3, BT, IS, TV, DX, D1 and D2)
- V = Version number (always one number)
- SATIDNN
- GOE-5 (6, 7, 8, 9, 10, 11)
- MET-2 (3, 4, 5, 6, 7, later MSG)
- GMS-1 (2, 3, 4, 5, later MTSAT)
- INS-1
- FY2-B
- NOA-7 (8, 9, 10, 11, 12, 14, 15, 16), for polar DX data, the satellite number is extended to indicate the geographic sector (N=north pole, S=south pole, A-C = three 120 degree longitude sectors at low latitudes plus/minus 50 degrees in daytime, D-F = same sectors at night), e.g., NOA-15S
- NOAA TOVS uses SATIDNN = CP for composites of satellites
- GLOBAL for combined satellite (gridded) ISCCP products (IS, TV, D1 and D2)
- YYYY = Year = 1981...2005 (always 4 numbers, fill = 9999)
- MM = Month number = 1, 2, 3...12 (always 2 numbers,fill = 99)
- DD = Day of month = 1, 2, 3...31 (always 2 numbers,fill = 99
- HHMM = Hour-Minute (always 4 characters, fill = 9999). For B1, B2 and XXOA data files, this field is the image start time to the nearest minute, where there is a leading zero for times less than 1000 on a 24-hour clock. If more than one data file has the same start time (to within one minute, i.e., polar orbiter data), the second and any subsequent files should have times ending in a letter (e.g., if there are three files with start times within one minute of 1215, they would have HHMM = 1215, 121b, 121c). For all other datasets, this is the Nominal Time of Day = 0000, 0300, ...2100 GMT.
Each type of data has an implicit time period covered most of the datasets represent one instant of time within a 3-hr interval. For the SN, SI, IS, TVM and D2 datasets, these time intervals are one week, one week, 5-days, one month, and one month, respectively. For these datasets HHMM = 9999 and the date information indicates the beginning date of a time interval with any unused time unit filled with 9's. Any climatology files may have the year filled but indicate a specific month, e.g., for the TVC dataset, YYYY.MM = 9999.03. For ancillary meta-data files (e.g., XXREADME, XXREAD, B3LWMAP and XXGRID), the date field will indicate the date of last change. - DCN = Data Center Name: For B1, B2, XXOA, SN, SI and TOVS datasets, these names are NOA, CSU, MSC (used to be AES), JMA, EUM (used to be ESA), (CMA will be added later), and SCC as appropriate. For B3, BT and DX datasets (also the associated meta-data file, XXTOC), these names will be a set of names used at the GPC that are a combination of SPC names and some indication of satellite position (eg., for METEOSAT-3 from CSU we use CME, for METEOSAT-5 over the Indian Ocean we use MTI, for morning NOAA polar orbiters, we use NOM). For all other datasets, the DCN is GPC.
ISCCP ftp Protocols
Each SPC will "push" data to the GPC and ICA and the GPC will "push" data to the ICA (and NASA Langley ASDC). For this purpose and to allow checking and revision of files, each center will have an account on the target center's ftp server. The data will be transmitted uncompressed (unless this is absolutely necessary in specific cases). Each file transmitted will be named using the filename convention described above. Each ftp session may transmit a single or multiple files. Data files will be organized in monthly directories. Directory names will follow the filename convention, except that the DD and HHMM fields will be dropped:
DIRECTORY Name = ISCCP.TTT.V.SATID.YYYY.MM.DCN
The version number will be reserved for use to re-transmit the same dataset if it is changed at a later date. For IS data, MM = 99, that is this directory will contain a whole year of data. TTT in these directories will only include actual data types (B1, B2, B3, BT, SN, SI, TOVS, IS, TV, DX, D1, D2 ), not the meta-data types (XXOA, XXTOC, B3LWMAP, XXGRID, XXREAD, XXREADME).
The files transmitted in each ftp session will be accompanied by a LOG file that is named:
ISCCP.XXFTPLOG.YYYY.MM.DD.HHMM.DCN
Where XX is the data type and the date and time fields can be filled with 9's as appropriate to the frequency of transmission sessions. The LOG file will contain a one-line summary for each file in the session:
filename, filesize (number of bytes), checksum, md5 output
Inclusion of md5 is optional but desired. The contents of the LOG file will refer to the uncompressed data, if data compression is used. If ftp data transmissions from the SPC are routine and on schedule, then no notification of data transmission is required. All data for a given month must be transmitted to the GPC and ICA by the 15th of the following month and a report (the current monthly report matrix) sent by e-mail by the same date. If data will be transmitted late or replaced after the cutoff date, then notification is required. The GPC and ICA will cross-check data files received against the monthly report matrices and notify each SPC within one-two working days that all files have been successfully received. The GPC will notify both the ICA and NASA Langley ASDC when data products are ready for transmission; this notification will also go to all data centers so that anyone may collect copies of the ISCCP data products.