Apr 2007
Options parsing can be difficult at times to say the least. There exist a number of common methods and libraries to assist with options parsing. In this text, a look at writing option and argument parsing homespun and with a little help.
shSimple parsing is easy in the shell:
while [ "$#" -gt "0" ] do case $1 in -F) F_FLAG=1 ;; -f) shift FILE_ARGUMENET=$1 ;; -u) Usage exit 0 ;; *) echo "Syntax Error" Usage exit 1 ;; esac shift done
Above, the input string is iterated over and particular options act or
assign a variable. The posix getopt capability allows for
built in - parsing:
while getopts ":f:Fu" opt; do case $opt in F) F_FLAG=1;; f) FILE_ARGUMENT=$OPTARG;; u) usage;; *) usage exit 1 ;; esac shift done
A colon after an option indicates it requires an argument. The
getopt code is far more compact than the first example. What if
the script requires long options? One approach is simply to hard code long
options:
while [ "$#" -gt "0" ]
do
case $1 in
-F|--setflag)
F_FLAG=1
;;
-f|--file)
shift
FILE_ARGUMENET=$1
;;
-u|--usage)
Usage
exit 0
;;
*)
echo "Syntax Error"
Usage
exit 1
;;
esac
shift
done
Setting up long options appears to be simple, however, it can quickly get
out of control using the method show above. Instead, writing code to handle
long options that can either be sourced in or easily dropped into scripts
makes far more sense. Grigoriy Strokin has a good script that can either be
copied in or sourced and can be found on his website. Following is the same
code from above using getoptex:
. getoptx.sh while getoptex "F; f; u. setflag file usage." "$@"; do F) F_FLAG=1;; f) FILE_ARGUMENT=$OPTARG;; u) usage;; *) usage exit 1 ;; done
It is pretty obvious that the single character is mapped to the the long
option past the first . and the full terminator is the second
dot. Of course, there is an even easier method as long as a few rules are
observed:
while [ "$#" -gt "0" ]
do
opt="${1//-}"
opt=$(echo "${opt}" | cut -c 1 2>/dev/null)
case $opt in
F) F_FLAG=1;;
f) shift;FILE_ARGUMENT=$1;;
u) usage;;
*) usage; exit 1;;
esac
shift
done
The problem with the last method is the long options are not hard-coded,
the first character of the alpha string is cut and used as an option. In other
words, --help and --heck will do the same thing. The
idea is harmless except no options can be mixed and matched. Generally speaking,
not having a --help and --heck valid in the same
script or program should be avoided if possible.
With no case statement built in, doing options parsing in Perl can be a little tricky. Using the same
example from the shell code above a simple options parser might look like:
[ 1 ]
while ( my $arg = shift @ARGV ) {
if ( $arg eq '-F' ) {
$F_FLAG = 1;
} elsif ( $arg eq '-f' ) {
$FILE_ARGUMENT = shift @ARGV;
} elsif ( $arg eq '-u' ) {
usage();
} else {
usage();
exit 1;
}
}
Relative to the shell, Perl seems a bit heavy handed in the amount of work needed. In Perl the options for handling are almost limitless. Associative arrays, hashes, arrays or just plain scalars arranged a certain way could be used.
Of course, another great thing about Perl is how simplistic string operations are handled. Using a method similar to the last shell method above can simplify the code a great deal:
for (my $argc = 0; $argc <= @ARGV; $argc++) {
$opt = $ARGV[$argc];
$opt =~ s/--//; # Get rid of 2 dashes
$opt =~ s/-//; # Get rid of 1 dash
$opt = substr($opt,0,1); # cut the first char
if ($opt eq 'F') {
$F_FLAG=1;
} elsif ($opt eq 'f') {
$FILE_ARGUMENT=$ARGV[++$argc];
} elsif ($opt eq 'u') {
usage();
} else {
usage();
exit 1;
}
}
Of course, the same two problems from the shell-code which cuts out the first alphanumeric exists; no two long options can start with the same letter and there is no verification of long options. Not unlike the shell, a simple list can be used to verify that long options are valid, following is an example sub routine:
...
my @valid_optlongs=("setflag", "file", "usage");
my @valid_optshort=("F", "f", "u");
...
sub parseopt{
my ($opt) = shift;
$opt =~ s/--//; # Get rid of 2 dashes
$opt =~ s/-//; # Get rid of 1 dash
if (scalar($opt) > 1) {
for ($i = 0; $i < @valid_optlongs; $i++) {
if ($opt eq $valid_optlongs[$i]) {
return $valid_optshort[$i];
}
}
} else {
return $opt;
}
}
Essentially instead of just trimming out the first valid alphanumeric, if the option is a long option check it against the list of valid long options and return the matching single byte option the long option correlates to.
Ultimately, using the getopt module should be done if it is
available, why reinvent the wheel? Here is an example of using the
Getopt module:
use Getopt::Std;
...
getopt ('f:uF');
die "Usage: $0 [ -f filename -u ]\n"
unless ( $opt_f or $opt_u );
if ($opt_f) {
my $filename = shift @ARGV;
} elsif ($opt_u) {
usage();
exit 0;
}
Definitely shorter and compact.
The oldest high level programming language - not unlike Perl - has many different approaches a programmer can take without using libraries:
int
main(argc, argv)
int argc;
char *argv[];
{
if (argc < 2) {
printf("usage: %s number-of-execs sbrk-size job-name\n",
argv[0]);
exit(1);
}
....
int main (argc, argv) {
for (c = 0; c <=argc; c++) {
if (argc[c] == 'F') {
F_FLAG=1
...
libc offers up two levels of built in options handling, one for
single options and one for long options. Since the options handling routines
are in modern implementations, the examples will use GNU's version.
...
#include <getopt.h>
...
int main (int argc, char **argv)
{
int c;
char * file;
while ((c = getopt(argc, argv, "F:f:u:")) != -1) {
switch (c) {
case 'F':
F_FLAG=1
break;
case 'f':
file = optarg;
break;
case 'u':
usage();
return 0;
break;
default:
usage();
return 1;
break;
}
}
Far more succinct than what may have happened using the previous C examples which would have been pretty spaghetti'd. Long options are even more interesting. The GNU C library internally handles assignment of long options by using the single alpha as the key inside of a data structure:
...
#include <getopt.>
...
int main(int argc, char **argv)
while (1)
{
static struct option long_options[] =
{
{"setflag", no_argument, 0, 'F' },
{"file", required_argument, 0, 'f' },
{"usage", no_argument, 0, 'u' },
{0,0,0,0} /* This is a filler for -1 */
};
int option_index = 0;
c = getopt_long (argc, argv, "F:f:u:", long_options, &option_index);
if (c == -1) break;
switch (c) {
case 'F':
F_FLAG=1;
break;
case 'f':
file = optarg;
break;
case 'u':
usage();
return 0;
break;
default:
usage();
return 1;
break;
}
}
Short, sweet and to the point.
Sometimes parsing can be extremely simple, adding long options and flag setting to the mix can be daunting when writing from the ground up, luckily libraries and modules exist to help along the way.
(based on last 2 months log reports)