GCF2ASC 1.7

GCF2ASC is a command line utility for converting GCF data to files in a user-specified layout in ASCII. Flexible command line parameters allow the user to control the layout of the header and data sections of the ASCII file.

Note: GCF2ASC will only process data for a single stream (i.e. a single component at a single sample rate from a single instrument). If you wish to convert a file containing multiple streams, you should pre-process it using GCFSPLIT and then convert the resulting files one by one.

Download

GCF2ASC v1.7 for Windows (84K .zip)

or

GCF2ASC v1.7 for Linux i386 (97K .gz).

Note: The Linux binary needs access to the Qt runtime library (2.4M .gz), either in your normal library path or in the current directory.

Note: This 32-bit software runs on 32-bit and 64-bit operating systems. To install it on a 64-bit Linux platform, please follow these instructions to install the relevant libraries.

Usage

To get a list of the options, run the gcf2asc program without any parameters. The following text is displayed:

	GCF2ASC      v1.7      (c) Guralp systems 2013
	Converter for GCF to ASCII text format
	Usage: GCF2ASC gcffile [/NoHdr] [/lineX="format string"] [/spl=X] [/fw=X]
			       [/uff] [/csmip] [/gm]
	  where:
		gcffile  is a vaild GCF file, or files containing data for one
			 stream. Wildcards are supported.
		/NoHdr   suppresses generation of the text header line(s).
		/lineX=  X is a line number (1, 2, 3, etc).
			 The format string can be used to create a custom header in
			 the file, to the user's requirements.
			 If not specified, the default is:
			 /line1="IIIIII  TTTTTT  YYYY MM DD HH NN SS  PPP"
			 See 'format specifiers' in the Scream help for more info
		/spl=X   Set 'X' Samples Per Line (default 1)
		/fw=X    Set the field width of each sample to 'X'. (default 12)
			 One of the characters is reserved for a space, so a value
			 of 12 allows for 11 digit numbers (including a '-' character)
		/uff     Default the header to be 'UFF' layout. "/lineX=" can be used
			 to override individual lines.
		/csmip   Default the header to be 'csmip' layout."/lineX=" can be
			 used to override individual lines.
		/gm      Read calvals.txt file (if available), and save values in
			 ground motion units.

	  A text file of the same name with extension '.txt' is generated containing
	  data points in ASCII numbers.
	  Any gaps in the time-series will be filled with value -2147483647
	  Any blocks with a timestamp going backwards will be ignored.

The simplest way to use the program is just to specify one parameter: a GCF file name. The default layout options will be used, and an ASCII file will be generated with one line of header, and one sample per line of data:

 ___ekb  _ekbz2  2003 05 21 18 48 00  040
     -113676
     -113649
     -113630
     -113605
     -113580
     -113546
     -113517
     -113485
     …
     …
     ⋮

You can also use wildcards in the filename, so gcf2asc *.gcf will convert every gcf file in the current directory. The name of the converted file will be the same as the gcf file name, with the extension changed to .txt.

The header line is generated using the file format IIIIII TTTTTT YYYY MM DD HH NN SS PPP. GCF2ASC interprets this line, replacing the special codes for the actual values of the source GCF file. For example, YYYY is replaced with a four-digit number representing the year of the start of the data in the GCF file. See below for a full list of codes.

If you wish to suppress all headers, and simply have a file containing the raw samples, use the /NoHdr command line option. This option is not advisable, as it loses important information about the data, such as time, sample rate, source name (system it came from), etc.

You can use the /lineX="" option to specify the header information you need (where X specifies the line number within the header). Note that you can have as many of these parameters as necessary, allowing multiple line headers to be constructed. Upper case letters are converted using the codes below; lower case letters and numbers are left exactly as they are. So, using the following command (note the ” marks around each line):

gcf2asc ekbz2.gcf /line1="gcf2asc sample conversion"
                  /line2="datetime: YYYY-MM-DD HH:MM:SS"
                  /line5="sysid:I  streamid:T  Psps"
                  /line6=""

Generates an output like:

 gcf2asc sample conversion
 datetime: 2003-05-21 18:05:00


 sysid:ekb  streamid:ekbz2  40sps

     -113676
     -113649
     -113630
     -113605
     -113580
     -113546
     …
     …
     ⋮

Notice that because line5 was specified, empty lines are inserted for lines 3 and 4, even though they were not listed. Note also that the parameter /line6="" was used to insert a blank line between the header and data sections.

If you need to insert a literal character without it being interpreted, use the \ escape character. For example,

YYYY \Years

might generate the text 2003 Years. Note that the \ is only ‘escaping’ the Y, not the ‘ears’. These characters do not need escaping, as lower case characters are treated as literals anyway. To get the output NONE, you should use \N\O\N\E (the ‘O’ does not need escaping but it causes no problems to do so).

The command line option /uff loads a specific set of /LineX= lines, which are suitable for producing a header layout as UFF dataset 58 format. This default uses a ‘minimum’ set of header entries. The user can override the lines using /LineX= lines. An example of where this could be useful is to add a detailed description to lines 4, 6 and 7 (or records 2, 4 and 5 off the UFF dataset 58 header).
This option also defaults the spl value (Samples Per Line – see below) to 6. The user can override this value using the /spl= option.

To control the layout of the samples, you can use the /spl= and /fw= parameters.
/spl= sets the Samples Per Line. The default (as shown in the example above) is 1 (/spl=1). The following output sample was generated using the same command line as previous example, but adding the /spl=5 parameter.

 gcf2asc sample conversion
 datetime: 2003-05-21 18:05:00

 sysid:ekb  streamid:ekbz2  40sps 

     -113676     -113649     -113630     -113605     -113580
     -113546     -113517     -113485     -113460     -113444
     -113421     -113405     -113382     -113368     -113344
     -113330     -113326     -113311     -113301     -113282
     -113267     -113245     -113226     -113205     -113185
     -113171     -113150     -113137     -113122     -113109 
     …
     …
     ⋮

Note that to prevent any new-line characters being inserted (all samples on one line), use the /spl=0 parameter.

/fw= sets the Field Width. That is, the number of characters used for each sample. The default is 12 (that is, one space and eleven for the number). If the number is negative, this is included, leaving ten digits for the number.

If the digitising source is a 16 bit digitiser, then the numbers are going to be in the range -32,768 to 32,767, which means 5 digits for the number, 1 for the -ve sign, 1 for the space, so a value of /fw=7 is sufficient.
A 24 bit digitiser can generate numbers in the range -8,388,608 to 8,388,607, so the minimum width should be 9 (7 digits, +1 for the -ve sign, +1 for the space).
32 bit numbers can be in the range -2,147,483,648 to 2,147,483,647, which needs 12 (10 digits +2 as above). This is default setting.

If a width is specified that is too small for the values in the data, the entire value is printed anyway, including a space between the values (as they become difficult to read otherwise), and the rest of the line is shifted. An example is shown below, where the parameters /fw=4 /spl=5 are used (the default header is used):

 __d850  _eka12  2003 05 21 18 48 00  040
 135 130 113 128 125
  55  36  63  58  33
   5 -15 -10 -18 -72
 -80 -53 -48 -59 -74
 -86 -109 -130 -127 -110
 -118 -126 -124 -107 -89
 -100 -103 -102 -98 -97
 -99 -97 -83 -78 -66
 -66 -91 -97 -110 -110
 -112 -100 -79 -51 -40
 -68 -56 -82 -85 -58
 -75 -76 -83 -83 -90
 …
 …
 ⋮

Note that gcf data is stored in a binary compressed format such that most samples require two bytes of storage each. Compare this to the default ASCII output of the converter, which requires 14 bytes (a default Field Width of 12 and two for the end of line (CRLF). This means the converted output file could be approximately 7 times larger than the GCF input file. This should be considered when converting large amounts of data.

Wildcard Substitution

A versatile formatting system allows the user to incorporate data extracted from the GCF file into header lines in the output file. Every time one of the upper-case characters listed below appears in a /linen= argument, it is replaced by data from the input file. The characters used are the same as those used for generating file-names in Scream!’s recording facility, with some additional characters specific to this program.

The following format specifiers are supported:

Specifier	Replaced with
`YY`	year as a two digit number (e.g. 98 for 1998).
`YYYY`	year as a four digit number (e.g. 1998).
`M`	month as a number without a leading zero (e.g. 1-12)
`MM`	month as a number padded to two digits (e.g. 01-12)
`MMM`	month as a short name (e.g. Jan) in the language configured for your system
`D`	date as a number without a leading zero (e.g. 1-31)
`DD`	date as a number padded to two digits (e.g. 01-31)
`H`	hour without a leading zero (e.g. 0-23)
`HH`	hour padded to two digits (e.g. 00-23)
`N`	minute without a leading zero (e.g. 0-59)
`NN`	minute padded to two digits (e.g. 00-59)
`S`	second without a leading zero (e.g. 0-59)
`SS`	second padded to two digits (e.g. 00-59)
`R`	day of year without leading zeros (e.g. 0-365)
`RRR`	day of year padded to three digits (e.g. 000-365)
`X`	Date code represented as an 8 digit hexadecimal number. Allows complete date to fit in a DOS 8.3 format.
`I`	system ID, without leading underscores (e.g. TEST)
`IIIIII`	system ID, padded to six characters (e.g. __TEST)
`T`	stream ID, without leading underscores (e.g. DMZ2)
`TTTTTT`	stream ID padded to six characters (e.g. __DMZ2)
`E`	serial number, without leading underscores. This is the stream ID without the last two digits.
`EEEE`	serial number as above, padded to four characters (e.g. _456)
`A`	The mapped Stream ID. If no mapping is defined for the stream, it is the same as `T`.
`C`	component identifier (Z,N,E,M, etc).
`P`	samples per second without leading zeros (e.g. 4-200)
`PPP`	samples per second padded to three digits (e.g. 004-200)
`\|NUMSMPLS\|`	† the number of samples in the file, right-justified and padded to ten characters with spaces (e.g. “ 1000″)
`\|SMPL--PER\|`	† the sample period, in seconds, expressed as an eleven-character floating point number in “scientific notation” (e.g. 1.0000E-003 for one millisecond)
`\|COMPNUM\|`	† the component number (i.e. 1=Vertical, 2=North/South and 3=East/West) in a nine-character field
`\|SMP\|`	‡ the number of samples in the file, right-justified and padded to five characters with spaces (e.g. “ 1000″)
`\|SMPS\|`	‡ the number of samples in the file, right-justified and padded to six characters with spaces (e.g. “ 1000″)
`\|#SECS\|`	‡ the file duration, in seconds, with two decimal places, right-justified and padded to seven characters with spaces (e.g. “ 2.50″)
`\|NUM-SECS\|`	‡ the file duration, in seconds, with three decimal places, right-justified and padded to ten characters with spaces (e.g. “ 2.500″)
`\|SMPL-PER\|`	‡ the sample period, in seconds, expressed as a fixed-point number with three decimal places, right-justified and padded to ten characters with spaces (e.g. “ 0.001″ for one millisecond)

Items marked † are not standard Scream formatting codes and have been introduced to support standard UFF headers. Items marked ‡ are not standard Scream formatting codes and have been introduced to support standard CSMIP headers.

The format string is case sensitive, so HH will be replaced with the hour, but hh will remain as a literal. This facility can be used to add constant descriptions or field separators as desired.

The following characters cannot be used due to operating system limitations:

: * ? " < > |

Examples of format strings and their results:

`T_YYYY_MM_DD;HHhNNmSSs`	dmz2_1997_10_05;07h35m20s
`T.YYYY.RRR_HHNNSS`	dmz2.1997.278_073520
`T.YYYY.M.D;H.N.S;C;P`	dmz2.1997.10.5;7.35.20;z;100

Known Issues

GCF2ASC version 1.5, which is part of the Scream distribution, contains a bug which could cause malformed output. The header incorrectly reported zero values for the number of samples and the file duration while a second, correct but incomplete header would appear at the end of the file. This is fixed in version 1.6, which also introduced the /gm option. Version 1.7 introduces support for 800sps data and all users are advised to upgrade.