DEOS Data Ingest Components Operations Guide

DEOS Technical Note #3

Dr. Geoffrey E. Quelch

Research Fellow,
University of Delaware,
Center for Climatic Research


   University of Delaware
   NewarkDE 19716
   
  

Version 2

All material herein is copyright by the Delaware Environmental Observing System

Published: February 27th 2004

Revision History
Revision 1.12004/02/27GEQ
Initial version
Revision 1.22004/03/02GEQ
Updated database description section.
Revision 1.32004/03/05GEQ
Added historical processing information.
Revision 1.42005/02/02GEQ
Added stream table.
Revision 1.52005/03/08GEQ
Added update option.
Revision 1.62005/05/31GEQ
Added file action option.
Revision 1.72005/08/09GEQ
Added discussion of stream state to database parameters and operational use sections.
Revision 1.82006/01/23GEQ
Added discussion of flags and related issues. Added DelDOT-Main stream.
Revision 2.12006/09/26GEQ
Incremented version number of this document to 2.
Revision 2.22007/04/02GEQ
Added DGS well streams.
Revision 2.32007/05/17GEQ
Added records option. Added DNERR streams. Added append option.
Revision 2.42007/07/27GEQ
Added USACE streams.
Revision 2.1.72008/08/14GEQ
Added statistics option.
Revision 2.1.82010/01/31GEQ
Added NOS streams.
Revision 2.1.112011/11/28GEQ
Added idnwq program and DNREC-Water_Quality stream.

Abstract

This technical note describes the DEOS ingest and idnwq programs, and the options for their use.


Table of Contents

Introduction
Streams
The Ingest Program
Running the Ingest Program
Command Line Options
Database Parameters
Operational Use
Historical Processing
The idnwq Program
Running the idnwq Program
Command Line Options
Input File Format
Database Parameters

Introduction

The DEOS ingest and idnwq programs provide the functionality to parse data files containing meteorological data and store them in the database for use by subsequent components of the DEOS system. The program also implements QA/QC procedures.

Streams

The ingest program utilizes streams for ingest. The following table describes the streams supported.

Table 1. Ingest Streams

Stream NameDescription
NEXRADThe only file type ingested at present is the digital precipitation array.
DEOS 
METAR-SFSSThis is the stream for a single-station dataset in a single file. As provided on the NWS web site.
USGS-STREAMUSGS stream-flow stations.
USGS-TIDALUSGS tidal stations.
RAWSNWS fire weather stations.
NDBC-BUOYNDBC Ocean buoy stations.
DelDOT-MainState of Delaware Department of Transport stations.
DEOS-WellsDelaware Geologic Survey well data passed by DEOS communications.
DGS-WellsDelaware Geologic Survey well data retrieved from web resources.
DNERR-WQ-MultiDelaware National Estuarine Research Reserve multi-station water quality data.
DNERR-Buoy-StatusDelaware National Estuarine Research Reserve buoy status data.
DNERR-Buoy-WaveDelaware National Estuarine Research Reserve buoy wave data.
USACE-Buoy-CurrentUS Army Corps of Engineers buoy data (current data.)
USACE-Buoy-HistoricalUS Army Corps of Engineers buoy data (historical data.)
NOS-Water-Level-PredictionNOS data (prediction data.)
NOS-Water-Level-ObsNOS data (observed data.)
DNREC-Water_QualityDNREC Water Quality data

The Ingest Program

The ingest is the main method to enter station data into the DEOS database.

Running the Ingest Program

The program is executed by the command ingest from the command line.

ingest { -u username | --user username } { -p password | --password password } { -d database | --database database } { -s stream-name | --stream stream-name } [ --update | --no-update ] [ -a action | --file-action action ] [ --append ] [ --statistics ] [ --records value ] [ -v value | --verbose value ] [ -h | --help ]

Command Line Options

The ingest program uses database configuration parameters to determine details of what to do with the various data streams it is presented.

User Option

The specification for the user option is:

{ -u username | --user username | --user=username }

This option will allow the program to connect to the database using the username provided. The user must already exist, and have sufficient permissions to access the database specified. This item is required.

Password Option

The specification for the password option is:

{ -p password | --password password | --password=password }

This option in combination with the username allows the program to connect to the database using the username provided. This item is required.

Database Option

The specification for the database option is:

{ -d database | --database database | --database=database }

This option will specify which database the program should use as a source of data, configuration items and a destination for any event logging items. This item is required.

Stream Option

The specification for the stream option is:

{ -s stream name | --stream stream name | --stream=stream name }

This option specifies the stream name for retrieval. This is a required argument.

Update Option

The specification for the update option is:

[--update]

This option overrides any database setting and forces the ingest program to update any data already existing in the database.

No-update Option

The specification for the no update option is:

[--no-update]

This option overrides any database setting and forces the ingest program to not update any data already existing in the database. This is used to speed up the ingest process for typically use during bulk upload.

File Action Option

The specification for the file action option is:

[ -a { n | m | d | c } | --file-action { n | m | d | c } | --file-action= { n | m | d | c } ]

This option specifies what action is to be performed on the input file following successful processing. The possibilities are [n]othing, [m]ove, [d]elete or [c]opy. The default is to move the file to the designated archive directory. If a problem is detected during processing, the file will be moved to the appropriate unprocessed directory.

Append Option

The specification for the append option is:

[ --append ]

This option, if provided, indicates that a timestamp of the current UTC time be appended to files after they are copied or moved. The default is to not add a timestamp.

Statistics Option

The specification for the statistics option is:

[ --statistics ]

This option, if provided, indicates that the ingest program should update the metadata statistics recorded in the database.

Records Option

The specification for the records option is:

[ --records value | --records=value ]

This option specifies the maximum number of data records that will be processed. If not provided, all data records will be processed. A value of zero indicates all records will be processed.

Verbose Option

The specification for the verbose option is:

[ -v value | --verbose value ]

The value provided for the verbose setting (a digit between 0 and 5) indicates how much output is generated, 0 being minimal and 5 being extremely verbose. Positive values of setting send data to the ELF, negative value also print the message to the screen. If no value is provided, the setting is assumed to be 1.

Help Option

The specification for the help option is:

[ -h | --help ]

This option prints a summary of all the command line options and exits.

Database Parameters

The following database tables provide configuration options for the ingest program.

Networks

This table contains data for the network associated with the specifiedstream, such as destination path, and network name and ID.

Streams

This table contains data for the specified ingest method for the stream, such as stream ID as well as the lock parameter discussed below.

Data_Types

This table contains data for the ingest data type (e.g., air temperature or precipitation, etc.), and minimum and maximum permitted values. Ingested items outside those bounds are flagged and thus not available for user viewing.

Station_Data

This table contains the ingested station data. The flag value is inserted appropriate to the result of the QA/QC process described below.

Operational Use

For operational use, it is expected that the ingest program be run from inside a UNIX cron system, and to be run at an appropriate time interval, depending upon the frequency of data updates at the remote site.

Stream State

Once the ingest program starts processing a specific stream, no other ingest process may process that stream's data. This is implemented by a database field lock that is only relinquished when the former ingest program completes successfully.

QA/QC Procedures

The ingest program implements QA and QC processes on all ingested data according the various levels described here.

Stage 0

Stage zero checks whether the data time is in the future. If so, bit 5 of the flag is set to 1.

Stage 1

Stage one checks whether the data is outside of the minimum/maximum bounds set for the particular data type. If so, bit 2 of the flag is set to 1.

Historical Processing

The ingest program can be used to ingest historical data easily. To do so, follow the steps below

  1. Place files to be processed in the input directory for the stream.

  2. Execute the ingest program for the stream.

The idnwq Program

The idnwq is the program used to enter historical DNREC water quality data into the database.

Running the idnwq Program

The program is executed by the command idnwq from the command line.

idnwq { -u username | --user username } { -p password | --password password } { -d database | --database database } { -f filename | --filename filename } [ --update | --no-update ] [ --append ] [ --statistics ] [ -v value | --verbose value ] [ -h | --help ]

Command Line Options

The idnwq program uses database configuration parameters to determine details of what to do with the various data streams it is presented.

User Option

The specification for the user option is:

{ -u username | --user username | --user=username }

This option will allow the program to connect to the database using the username provided. The user must already exist, and have sufficient permissions to access the database specified. This item is required.

Password Option

The specification for the password option is:

{ -p password | --password password | --password=password }

This option in combination with the username allows the program to connect to the database using the username provided. This item is required.

Database Option

The specification for the database option is:

{ -d database | --database database | --database=database }

This option will specify which database the program should use as a source of data, configuration items and a destination for any event logging items. This item is required.

Filename Option

The specification for the filename option is:

[ -f filename | --filename filename | --filename=filename ]

This option specifies a file name for output.

Update Option

The specification for the update option is:

[--update]

This option overrides any database setting and forces the ingest program to update any data already existing in the database.

No-update Option

The specification for the no update option is:

[--no-update]

This option overrides any database setting and forces the ingest program to not update any data already existing in the database. This is used to speed up the ingest process for typically use during bulk upload.

Append Option

The specification for the append option is:

[ --append ]

This option, if provided, indicates that a timestamp of the current UTC time be appended to files after they are copied or moved. The default is to not add a timestamp.

Statistics Option

The specification for the statistics option is:

[ --statistics ]

This option, if provided, indicates that the ingest program should update the metadata statistics recorded in the database.

Verbose Option

The specification for the verbose option is:

[ -v value | --verbose value ]

The value provided for the verbose setting (a digit between 0 and 5) indicates how much output is generated, 0 being minimal and 5 being extremely verbose. Positive values of setting send data to the ELF, negative value also print the message to the screen. If no value is provided, the setting is assumed to be 1.

Help Option

The specification for the help option is:

[ -h | --help ]

This option prints a summary of all the command line options and exits.

Input File Format

The input filename option must point to a file conforming to the description given in this section. The first line is assumed to be a header line and is skipped during processing.

  • Records (lines) are assumed to be comma delimited fields of the following data.

    • Station Name

    • Date of observation in YYYY/MM/DD HH:MM:SS format

    • Data type ID

    • Data value

Database Parameters

The following database tables provide configuration options for the idnwq program.

Data_Types

This table contains data for the ingest data type (e.g., air temperature or precipitation, etc.), and minimum and maximum permitted values. Ingested items outside those bounds are flagged and thus not available for user viewing.

Station_Data

This table contains the ingested station data. The flag value is inserted appropriate to the result of the QA/QC process described below.