GREP -- Find Regular Expressions in Files
User Guide

release 5.33, program and document revised 19 August 2001
Copyright © 1986-2001 by Stan Brown, Oak Road Systems

GREP is a filter that searches input files, or the standard input, for lines that contain matches for one or more patterns called regular expressions and displays those matching lines.

              
 Why GREP?
 Getting Started
  System Requirements
  Installation
  Evaluation, License, and Warranty
 User Instructions
 Input Files
  File and Path Names
  Binary Files and Long Lines
  Wild Card Expansion
  Subdirectory Searches
 Options
  0   1   ?   A   B   C   D   F   H   I   L   N   P   Q   R   S   U   V   W   Y   Z  
  Pattern-Matching Options: /F, /I, /V, /Y
  Input File Options: /A, /R, /S, /W
  General Options: /D, /Q, /Z, /0, /1, /?
  Output Options: /B, /C, /H, /L, /N, /P, /U
  Environment Variable
 Regular Expressions (Regexes)
  Normal and Special Characters
  How to Construct a Regex
  Special Rules for the Command Line
 Return Values (ERRORLEVEL)
 Problems
 What's New in 5.3?

 

Why GREP?


The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, There, then, and so on? You don't really want to search for a specific string. Rather, what you're looking for is a regular expression or regex, namely the preceded and followed by something other than a letter. GREP to the rescue!

GREP combines most features of UNIX grep and fgrep. GREP has many other advantages over FIND besides using regular expressions:


Getting Started


System Requirements

The 16-bit version, GREP16, runs under DOS 2.0 or higher, including a DOS box under Windows. The 32-bit version, GREP32, requires a DOS box under Windows 98, Win95, or Win NT 4.0. (I fully expect it to run in Windows 2000, but have not tested it.)

The two executables operate the same and have the same features, except that GREP32 supports long filenames. If you typically run GREP in a DOS box under Windows 9x or NT, GREP32 is the one you want.

Installation

There is no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path.

You may wish to rename the executable you use more often to the simpler GREP.EXE. All the examples in this user guide will assume you've done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.

Evaluation, License, and Warranty

GREP is shareware. If you use it past a 30-day evaluation period, you are morally and legally bound to register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.

The registered version offers these improvements over the evaluation version:


 

User Instructions


For a quick summary of operating instructions, type
        grep /? | more
The full command form is either of
        grep [options] [regex] [<inputfile] [>outputfile]
        grep [options] [regex] inputfiles [>outputfile]
In the first form, GREP is a filter, taking its input from the standard input (most likely piped from some other command). In the second form, GREP takes its input from any number of input files, possibly specified with paths and wild cards.

In both forms, the optional outputfile will receive the matching lines (or other output, depending on the output options). For output to the screen, omit > and outputfile.

regex is a regular expression; see below for how to construct one. A regex is normally required on the command line; however, if you use the /F option, regexes will be taken from a file or the keyboard instead of the command line.

The command-line options, and the values returned through ERRORLEVEL, are explained below. You can actually put options anywhere on the command line, not just before the regex. All the options are processed before any files are scanned, so it doesn't matter whether a given option comes before or after the files or between two file specs.

Example:

        grep /I pic[t\s] \proj\*.cob >prn
will examine every COBOL source file in the PROJ directory and print every line that contains a picture clause ("pic" followed by either "t" or a space) in caps or lower case (the /I option).
        grep /I /S pic[t\s] \*.cob >prn
will examine every COBOL source file in all directories on the current disk (the /S option).
 

Input Files


As mentioned earlier, you can use GREP as a standard filter, either by piping from another command with "|" or by redirecting input from a file with "<". If you don't specify any regular files or any redirection for input, GREP will simply wait. You can end the GREP run by pressing Control-Z and Enter, or Control-Break.

GREP can read text files just fine, whether lines are separated by the DOS-style carriage return plus line feed or the UNIX-style line feed only. See below for binary files.

File and Path Names

When telling GREP to read input files, you can specify them in the normal way with paths and wild cards. For example:

        grep regex ..\*.c *.h d:\dir1\dir2\orich?.htm
The separator between directories in a path can be a backslash "\" or forward slash "/".

If input file names or paths contains spaces, you must enclose them in double quotes. This is a DOS restriction, not a feature only of GREP. For instance,

        grep regex c:\Program Files\My Office\*
contains three file specs, namely c:\Program, Files\My, and Office\*. That's probably not what you meant. Double quotes preserve your intended meaning:
        grep regex "c:\Program Files\My Office\*"

GREP thinks that anything that starts with a hyphen is an option. So if a file name starts with a hyphen, use the standard DOS syntax for "current directory". For example, to search file -omega.txt, type

        grep regex ./-omega.txt

Binary Files and Long Lines

When GREP reads input files, it normally treats them as line-oriented text. However, you can also use GREP with binary files such as executables and word-processing files. Please see the description of the /R option for details.

If a text file contains null characters (ASCII 0) or Control-Z characters (ASCII 26), GREP cannot process it correctly in text mode and you must use the /R option to invoke binary mode.

When reading a very long input text line, GREP processes it in chunks. Please see the description of the /W option for details.

Wild Card Expansion

There are several important things to bear in mind about how GREP expands file names containing ? or *:

Subdirectory Searches

If you specify the /S option, GREP will search not only the files indicated on the command line, but also the files in subdirectories.

For example, with the command

        grep /S regex \hazax*.* *.c g:\mumble\*.htm
GREP will examine all files on the entire current drive whose names start with hazax; then it will look at all C source files in the current directory and all subdirectories under it; finally it will look at all HTML files in directory g:\mumble and all subdirectories under it.

Perhaps a more realistic example is this: you have a document about Vandelay Industries somewhere on your disk, but you can't remember where. This command should find it:

        grep Vandelay /S \*.*
(You can abbreviate \*.* to \* with GREP32.) You may also want to use the /I option if you can't remember whether "Vandelay" was capitalized normally.

Subdirectory search follows the normal file-searching rules: hidden and system subdirectories are normally ignored. (Yes, you have them if you have Windows 9x.) The /A option also applies during subdirectory search: with /S and /A together, GREP will search every subdirectory. There's no way to search every subdirectory but only normal files, or to search only normal subdirectories but to search for hidden files in them.

You may want to know in what order GREP examines files when the /S option is set. Ordinarily, GREP examines all files in the first file argument, including the subdirectory tree, then proceeds to the second file argument, and so on. However, when you use the /S option and none of the file arguments contains a path, GREP will look first for all those files in the current directory, then for all of them in the first subdirectory, and so on.

If you give GREP a filename argument that doesn't exist, it will normally tell you, unless you used the /Q option. However, when you specify /S (search subdirectories), GREP can't give such a warning because the specified file may exist in some subdirectory.

(The /S option is fully functional in the registered version, and will search all the way to the bottom of a directory tree. In the evaluation version, GREP will search the named or implied directories and all directories immediately below them, but no further in any one execution.)

The /D option will show you every directory and wild-card search as GREP performs it. The output also contains lots of other stuff, but the file visits all contain the string "GX:".
 


Options


GREP's operation can be modified by several options, either on the command line or in an environment variable (see below). On the command line, options can appear anywhere, before or after the regex and the file specs. All options are processed before any files are read.

Four sections below describe the options in detail, by functional groups: pattern-matching options, input file options, general options, and output options. Here are quick hyperlinks to all the options:

0   1   ?   A   B   C   D   F   H   I   L   N   P   Q   R   S   U   V   W   Y   Z  

You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, and leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 and /B options:

        /p3 -b    /b/P3    /p3B    -B/P3    -P3 -b
This user guide will always use capital letters for the options, to make it easier to distinguish letter l and figure 1.

Pattern-Matching Options: /F, /I, /V, /Y

/Ffile   or   /F-
Read one or more regexes from file instead of taking a single regex from the command line, and report lines from the input file(s) that match any of the regexes read from file. You must enter the regexes one per line in the file; don't surround them with quotes. (This is similar to the F option in UNIX grep, but unlike UNIX grep, you can have multiple regexes in the file.)
 
file must follow the /F with no intervening space, and the filename ends at the next space. If you use a minus sign as the filename (/F- option), GREP will accept regexes from standard input. Don't do this if you are redirecting file input from a file with the < character!
 
When the file contains two or more regexes, GREP normally reports each line from the input file that matches any of the regexes. The /V option and /Y option modify that behavior according to the rules of logic. Specifically:
 
When two or more regexes are read from file
and these options are set then GREP reports each line from the input files if it matches
not /V, not /Y at least one of the regexes
/V, not /Y none of the regexes
/Y, not /V all of the regexes
/V and /Y less than all of the regexes (possibly none)
 
(The /Ffile option is activated only in the registered version. /F- works in the evaluation version and the registered version.)
 
/I
Ignore case, treating caps and lower case as matching each other. (This is the same as the I option in UNIX grep and DOS FIND.)
 
Caution: the /I option does not apply to 8-bit characters (characters 128-255). Because there are many different encoding schemes, it is ambiguous which characters above 127 correspond to each other as upper and lower case on your computer. Therefore, if you want case-blind comparisons, you must explicitly code any 8-bit upper and lower case in your regex. For instance, to search for the French word "thé" in upper or lower case, code it as th[éÉE] since é can be upper-cased as É or as plain E. The "th", being 7-bit ASCII characters, will be found as upper or lower case by the /I option. (You may need to code 8-bit characters like éÉ in a special way if you enter them on the command line; see Special Rules for the Command Line below.)
 
/V
Show or count the lines that don't match instead of those that do. (This is the same as the V option in UNIX grep and DOS FIND.)
 
For the effect of the /V option with two or more regexes, see the /F option above.
 
/Y
When multiple regexes are being sought (/F option), report as matching lines only the lines from the input files that match all the regexes in any order.
 
For example, if you use the /F option and enter the two regexes brown and fox, then all of these lines will match:
        The quick brown fox
        I see a brown smudge
        Crazy like a fox
        The fox's tail is brown
But if you also use the /Y option, then GREP will match only lines that contain both the regular expressions, namely the first and fourth lines in the example. In other words, multiple regexes are normally joined by OR, but with the /Y option they are joined by AND.
 
As you see from the example, with the /Y option, input lines must match all the regexes, but in any order. If you want to match all regexes in a specific order, spcify them as a single regex connected with ".*". For instance, to match lines that contain "brown" somewhere before "fox", use the regex brown.*fox.
 
For the effect of the /V option with the /Y option, see the /F option above.

Input File Options: /A, /R, /S, /W

/A
Include hidden and system files when expanding wild cards (* and ?) in file specifications. Without this option, GREP will ignore hidden and system files while searching for files that match a wild card. However, if you explicitly specify a file on the command line, GREP will always read it even if it's a hidden or system file.
 
The /A option also modifies the action of the /S option (if present), determining whether subdirectories marked hidden or system will be searched.
 
/R
Read files as binary. This option lets you search for regexes in .EXE and .DLL files, word-processing files, and the like.
 
A text file has lines ending with carriage return (ASCII 13), line feed (ASCII 10), or both; and the first Control-Z (ASCII 26) marks the end of file. Also, a text file doesn't contain any NUL characters (ASCII 0). Binary files, on the other hand, may have NUL and Control-Z characters in the middle, and often don't have "lines" separated by anything.
 
DOS doesn't mark files as binary or text, and therefore GREP has no way to know which a given file may be. By default it treats all files as text, but if you specify the /R option then GREP will treat all files as binary. There's no way to treat some input files as text and others as binary within a single GREP command.
 
When GREP reads files in binary mode, there's no such thing as a line, so GREP reads files in blocks of characters. The block size is given by the /W option. Since there is no such thing as a line, the ^ and $ characters (start and end of line) in a regex are treated as normal characters.
 
The choice of text or binary mode also affects how GREP displays lines that contain matches. In normal text mode, any matching lines are displayed character for character. Non-printing characters, like tab (ASCII 9) or Control-X (ASCII 24), are given no special treatment, which means that screen output may appear strange. But in binary mode, non-printing characters are displayed using their numeric value in hex, such as <09> or <18>.
 
Only the input files are read in binary mode. Regardless of the /R option, when you use the /F option to read the regexes from a file, that file is read in normal text mode. Also, if you don't specify any input files, GREP always scans the standard input (possibly piped with | or redirected with <) in text mode.
 
/S
Search subdirectories. Please see the section on subdirectory searches, above.
 
/Wwidth
Expect text lines up to width characters long, or process binary files in blocks of width characters. For GREP32, the default width is 4096 and you can specify anything from 10 to 2147483645; for GREP16, the default is 256 and you can specify 10 to 32765. (The width is also limited to available memory, which will depend on your system configuration and what other programs you have running at the time. In a DOS box under Windows 9x, available memory includes Windows virtual memory.)
 
Text mode (/W option without /R)
 
The CR/LF (ASCII 13 or 10 or both) at the end of line don't count against the specified width. If GREP reads a long line from the input, it will break it after width+1 characters and treat the remainder as a separate line. The whole line gets scanned, but any match that starts before the break and ends after the break will be missed. Therefore, if possible you should set width large enough to hold the longest line in the file.
 
If GREP does find any lines longer than the specified or default width, it will display a warning message at the end of execution, telling you the length of the longest line. (This warning is suppressed by the /Q option.)
 
In text mode, GREP will ignore anything on a line after the first null (ASCII 0), and it will ignore the rest of the file after a Control-Z (ASCII 26). Any files that contain these characters must be scanned in binary mode for accurate results.
 
Binary mode (/W option and /R together)
 
Since binary files are usually not line oriented, depending on the width it is possible that a match might start in one block and end in the next, and thus be missed by GREP. One sure cure, if you have enough memory, is to specify a width at least as great as the file size. Failing that, you can minimize the problem by using a width that is large compared to the length of your regexes, or by scanning twice with two different widths.

General Options: /D, /Q, /Z, /0, /1, /?

/Dfile   or   /D   or   /D-
Display debugging information. This includes whether you're running GREP16 or GREP32, whether this program is registered, the contents of the environment variable, the values of all options specified or implied, the files specified, the raw and interpreted values of the regex(es), and details of every file scanned. This information is normally suppressed, but you may find it helpful if GREP seems to behave in a way you don't expect or if you have a bug report.
 
Since the debugging information can be voluminous, if you want to see it at all you will usually want to specify an output file: file must follow the D with no intervening space, and the filename ends at the next space. GREP will append to the file if it already exists.
 
A plain /D sends debugging information to the standard error output (normally the screen). Be careful not to specify any other options between /D and the next space, or they'll be taken as a filename. Finally, /D- sends debugging information to the standard output, which you can redirect (>) or pipe (|). This intersperses debug information with the actual output of CMP.
 
You can weed through the debugging output to some extent. GREP writes the following unique strings on most lines of output, so you can send debug output to a file and then grep the file for
 
/Q
Suppress the program logo and all warning messages. Error messages will still be displayed (as will debug output, if you set the /D option).
 
(The /Q option is activated only in the registered version.)
 
/Z
Reset all options to their default values.
 
If you use the /Z option on the command line, any options in the environment variable will be disregarded, and so will any preceding options on the command line. This can be useful in batch files, to make sure that the action of GREP is controlled only by the options on the command line, and not by any settings in the environment variable.
 
The /Z option is the only single-letter option whose effect can't be reversed. If you use /Z more than once, GREP disregards the environment variable and all command-line options up through the last /Z.
 
/0 or /1
These options control the values that GREP returns in the DOS error level. /0 returns 0 if there are differences or 1 if there are no differences; /1 returns 1 for differences or 0 for no differences. For more details, see Return values below.
 
/?
Display a help message and summary of options and regex forms, then exit with no further processing. The help message is longer than 25 lines, so if you have a 25-line screen you probably want to pipe it through more or a similar filter, like this:
        grep /? | more

You can also redirect this information. For instance,

        grep /? >prn
will send the help text to the printer.

Output Options: /B, /C, /H, /L, /N, /P, /U

Before going through the output options, let's take a moment to look at some of the possible output formats. By default, GREP's output is similar to that of DOS FIND:

        ---------- GREP.C
                op_showhead = ShowNoHeads;
                else if (op_showhead == ShowNoHeads)
                op_showhead = ShowNoHeads;

        ---------- GREP_MAT.C
                op_showhead == ShowNoHeads)
However, the /U option produces UNIX grep-style output like this:
        GREP.C:        op_showhead = ShowNoHeads;
        GREP.C:        else if (op_showhead == ShowNoHeads)
        GREP.C:        op_showhead = ShowNoHeads;
        GREP_MAT.C:        op_showhead == ShowNoHeads)
As you can see, the main difference is that DOS-style output has the filename as a header above the group of matching lines from that file, and UNIX-style output has the name of the file on every matching line.

The output options give you a lot of control over what GREP produces, but they can be confusing. Here's the executive summary:

Now, in alphabetical order, the options that control what GREP outputs and how it is formatted:

/B
Display a header for every file examined, even if the file contains no matches. (This option is meaningful only with DOS-style output, when the /U option is not set.)
 
/C
Display only a count of the matching lines in each file, instead of the matching lines themselves. (This is the same as the C option in UNIX grep and DOS FIND.)
 
Lines are counted, not matches. If a match occurs several times on a line, or several regexes match the same line, the line is counted only once.
 
/H
Don't display any filenames as headers. This is useful when you're using GREP as a filter to extract lines from a file for processing by another program, like this:
    grep /H "Directory" <inputfile | other program
/L
Display only a bare list of the names of files that contain matches, not the actual lines that match. With the /V option, display the names of files that contain no matches. (This is the same as the L option in UNIX grep.)
 
/N
Show the line number before each matching line. (This is the same as the N option in UNIX grep and DOS FIND.) DOS-style output with the /N option looks like this:
    ---------- GREP.C
    [ 144]        op_showhead = ShowNoHeads;
    [ 178]        else if (op_showhead == ShowNoHeads)
    [ 366]        op_showhead = ShowNoHeads;

    ---------- GREP_MAT.C
    [  98]        op_showhead == ShowNoHeads)
With both /N and the /U option together, the UNIX-style output looks like this:
    GREP.C:144:        op_showhead = ShowNoHeads;
    GREP.C:178:        else if (op_showhead == ShowNoHeads)
    GREP.C:366:        op_showhead = ShowNoHeads;
    GREP_MAT.C:98:        op_showhead == ShowNoHeads)
UNIX-style output is suitable for use with the excellent freeware editor Vim.
 
/Pbefore,after
Show context lines before and after each match. If you omit after, GREP will show the same number of lines after each match as before. If you omit both numbers, GREP will show two lines before and two lines after.
 
Either number can be 0. For instance, use /P0,4 if you want to show every match and the four lines that follow it.
 
If you use the /P option, you probably want to use the /N option as well, to display line numbers. In that case, the punctuation of the line numbers will distinguish which lines are actual matches and which are displayed for context. Here is some DOS-style output from a run with the options /P1,1N set:
    ---------- GREP.C
      143     if (opcount >= argc)
    [ 144]        op_showhead = ShowNoHeads;
      145
      177             PRTDBG "with each matching line");
    [ 178]        else if (op_showhead == ShowNoHeads)
      179             PRTDBG "NO");
      365     if (myToggle('L') || myToggle('U') || myToggle('H'))
    [ 366]        op_showhead = ShowNoHeads;
      367     else if (myToggle('B'))

    ---------- GREP_MAT.C
       97         op_showwhat == ShowMatchCount ||
    [  98]        op_showhead == ShowNoHeads)
       99         headered = TRUE;
As you can see, the actual matches have square brackets around the line numbers, and the context lines do not.
 
GREP16 has to allocate space for the preview lines within the same 64 K data segment as all other data. Consequently, if you specify a moderately large value, particularly with a large line width, you may get a message that GREP can't allocate space for the lines. To resolve this, use GREP32 if possible; otherwise either reduce the first number after /P, or use the /W option to reduce the line width. (The after number has no effect on memory use.)
 
/U
Show the filename with each matching line, instead of just once in a separate header. This UNIX-style output is useful with editors like Vim that can automatically jump to the file that contains a match. Some examples of UNIX-style output have been given earlier in this section.
 
There's one small difference from UNIX grep output: UNIX grep suppresses the filename when there is only one input file, but GREP assumes that if you didn't want the filename you wouldn't have specified the /U option. Neither GREP and UNIX grep displays a filename if input comes from a file via < redirection.

In addition to these options, the /R option, described in detail earlier, makes GREP read files in binary mode, and that has a side effect on the output format.

Some combinations of output options are logically incompatible. For instance, /H/L makes no sense (don't list filenames, and list only filenames with matches). In such cases, GREP will turn off one of the incompatible options and tell you what it did (unless you suppress such messages with the /Q option). The incompatibilities are just common sense, but are listed here for completeness:
       /B   overrides /H; ignored with /L or /U
       /C   overrides /H, /L, /N, /P
       /H   ignored with /B, /C, /L, /U
       /L   overrides /B, /H, /N, /P, /U; ignored with /C
       /N   ignored with /C or /L
       /P   ignored with /C or /L
       /U   overrides /B and /H; ignored with /L

Environment Variable

If you use certain options frequently, with the registered version of GREP you can put them in the ORS_GREP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

Only options can be put in the environment variable. If you want to "can" a regex, put it in a file and put /Ffile in the environment variable.

If you have some options in the environment variable but you don't want one of them for a particular run of GREP, you don't have to edit the environment variable. You can make most changes on the command line, like this:

If you're ever in doubt about the interaction of options between the command line and the environment variable, simply type

        grep /d
and GREP will tell you all the option settings in effect.

Regular Expressions (Regexes)


A regular expression or regex is a pattern of characters. It can be a simple text string, like mother, or something more complex. For instance, the regex for a U.S. telephone number is [0-9][0-9][0-9]-[0-9][0-9][0-9][0-9], which is three digits, followed by a hyphen, followed by four digits. A simpler example: the regex for any word starting with "moth" is moth[a-z]*, which is the letters "moth" followed by any number of letters a through z. Yes, that regex does match "moth" itself: see the "repetition" entry below.

A regex is essentially a string with a bunch of operators thrown in to express possibilities like "any of these characters" and "repeated".

Different utilities define regexes differently. Here is how GREP defines them.

Normal and Special Characters

To understand regexes, you need to know the difference between special characters and normal characters. (The meanings of the special characters will be explained in the next section.)

The following characters are special if they occur in the listed contexts:

Any other character, or one of the above characters not in the listed context, is a normal character. Any of the above characters also becomes a normal character if preceded by a backslash, as will be shown below.

How to Construct a Regex

Here are the rules:

single character
Any normal character matches itself. To match a special character, precede it with a backslash (\). Example: to search for the string "^abc\def", you must put backslashes before the two special characters to make GREP treat them as normal characters and not give them special meanings, so that \^abc\\def is your regex.

You can use any character from space through character 255. If using 8-bit characters or certain special characters on the command line, see Special Rules for the Command Line below.

If you specify the /I option, any letter A-Z or a-z that you specify will match both the capital and the lower case of that letter. Other letters are not affected by the /I option.
 

character class
To match any one of a group of characters, enclose them in square brackets ([ ]). Examples: [aA] will match an upper- or lower-case letter A; sno[wr]ing will match "snowing" or "snoring".

You can indicate a character range with the minus sign (-). Examples: [0-9] will match any single digit, and [a-zA-Z] will match any English letter. To match any Western European letter (under most recent versions of Windows, in North America and Western Europe), use [a-zA-ZÀ-ÖØ-öø-ÿ]. (That regex will work fine on the command line with GREP16 or in a file [/F option] with either GREP. But to enter it on the command line with GREP32, you must use numeric sequences for the 8-bit characters, for example [a-zA-Z\192-\214\216-\246\248-\255]. See "Special Rules for the Command Line" below.)

A character class can contain both ranges and single characters, and the order doesn't matter as long as each range within the class is written low-high.
 

negative character class
To match any character that is not in a class, use square brackets with a caret (^). Examples: [^0-9 ] matches any character except a digit or a space, and the[^a-z] matches "the" followed by anything except a lower-case letter.

Note: The negative character class matches any character not within the square brackets, but it does match a character. For instance, the[^a-z] matches "the" followed by something other than a lower-case letter; it does not match "the" at the end of a line because then "the" is not followed by any characters. Please see the extended example at the end of these rules for further explanation.
 

repetition
A plus sign (+) after a character or character class matches one or more occurrences; an asterisk (*) matches zero or more occurrences. Examples: snor+ing matches "snoring", "snorring", "snorrring", and so on, but not "snoing". snor*ing matches "snoing", "snoring", and so on.

Used with a character class, the plus sign and asterisk match any multiple characters in the class, not only multiple occurrences of the same character. For instance, sno[rw]+ing matches "snowing", "snorwing", "snowrring", and so on.

Obligatory example: [A-Za-z_]+[A-Za-z0-9_]* matches a C or C++ identifier, which is at least one letter or underscore, followed by any number of letters, digits, and underscores.
 

start of line, end of line
A caret (^, ASCII 94) at the start of a regex means that the pattern starts at the beginning of a line in the file(s) being searched. A dollar sign ($, ASCII 36) at the end of a regex means that the pattern ends at the end of a line in the file(s) being searched. If these characters occur anywhere else, they are treated as normal characters.

Example: ^[wW]hereas matches the word "Whereas" or "whereas" at the start of a line, but not in the middle of a line. Blanks are not ignored, so if you want to find that word whenever it's the first word of the line, you need to use a pattern like ^ *[wW]hereas to allow for indention.

Examples: ^$ will find lines that contain no characters at all. ^ *$ will match lines that contain no characters or contain only spaces. ^ +$ will match lines that contain only spaces, but not empty lines.

Examples: ^[A-Za-z]+$ will find every line that contains nothing but English letters. ^ *[A-Za-z]+ *$ will find every line that contains exactly one English word, possibly preceded or followed by blanks.

These characters for start of line and end of line have no special meaning in binary mode, which is controlled by the /R option. In binary mode, the ^ and $ are treated as normal characters.

Extended example: suppose you want to find the word "the" in a file, whether in caps or lower case. You can use the /I option to make the search case blind, and concentrate on constructing the regexes. At first glance, [^a-z]the[^a-z] seems adequate: anything other than a letter, followed by "the", followed by anything but a letter. That lets in "the" and rules out "then" and "mother". But it also rules out "the" at the beginning or end of a line. Remember that a negative character class does insist on matching some character. So the solution is to have four regexes, for "the" at the beginning, middle, or end of a line, or on a line by itself:
        ^the[^a-z]
        [^a-z]the[^a-z]
        [^a-z]the$
        ^the$
So to search for just the occurrences of the word "the", you'd put those four lines in a file and then use the /F option on GREP.

Special Rules for the Command Line

The cautions and special rules in this section apply only when you enter a regex on the command line. Please ignore this section when using either form of the /F option, which I recommend when your regex is at all complicated.

When you enter a regex on the command line, you have to contend with command-line parsing, which changes the meanings of some characters before GREP ever sees them. Putting double quotes around the expression may help, but it doesn't avoid all problems.

If your regex begins with a minus (-) or slash (/), GREP will try to interpret it as an option. Example: if you're searching for the string "-in-law", GREP will think you're trying to turn on the options /I, /N, and so on. To avoid this problem, use a leading backslash (\-in-law).

If your regex contains certain special characters like <, =, and |, DOS will give those characters their special DOS meaning and GREP will never see them. So you must use special "escape sequences" to represent those characters in a regex on the command line:

instead ofyou can use any of
< (less)\l \60  \0x3C \074
> (greater)\g \62  \0x3E \076
| (vertical bar)\v \124 \0x7C \0174
" (double quote)    \" \34  \0x22 \042
, (comma)\c \44  \0x2C \054
; (semicolon)\i \59  \0x3B \073
= (equal)\q \61  \0x3D \075
(space)\s \32  \0x20 \040
(tab)\t \9   \0x09 \011
(escape)\e \27  \0x1B \033

You can enter any character as a numeric sequence, not just the special characters in the above list. Use decimal, hex (leading 0x), or octal (leading zero). Example: capital A would be \65, \0x41, or \0101.

Finally, if your regex contains 8-bit characters, Microsoft's 32-bit startup code (not DOS) will translate these characters from a DOS character set to a Windows character set, which is probably not what you want. To avoid this problem, either enter the regex in a file (/Ffile), let GREP prompt you to enter it from the keyboard (/F-), or use the numeric sequences to enter characters. Example: In a regex on the command line, instead of actually typing the character é, enter it as \233 or \0xE9 or \0351.

Remember, the rules in this section are required only to get around parsing problems on the command line. These escape sequences are not needed, and don't work, when you use the /F option to enter regexes in a file or from the keyboard.
 


Return Values (ERRORLEVEL)


By default, GREP will return one of the following values to DOS, and you can test the return value with IF ERRORLEVEL in a batch file.
 
255   bad option, or other error on the command line or regex
254specified file not available
253insufficient memory: try reducing values specified with the /P option or /W option, or use GREP32 if possible
128program error in expanding a regex
2help message displayed (/? option, or nothing specified on the command line)
0program ran to completion (whether or not there were any matches)
 

You might want to use GREP in a batch file or a makefile and take different actions depending on whether matches were found or not. To do this, use the /0 or /1 option. With the /1 option, GREP returns these values of ERRORLEVEL:
0    no matches were found
1one or more matches were found
2-255as above
 
/0 is the opposite: it returns these ERRORLEVEL values:
0one or more matches were found
1    no matches were found
2-255as above
 
In other words, the /0 or /1 option lets you tell GREP which value to return if matches are found.
 


Problems


If an input line contains a NUL character (ASCII 0), GREP will ignore any later characters on that line. A text file should never contain a NUL character, but if it does you can read it by using the /R option.

GREP's regexes are slightly different from the UNIX flavor. Specifically, to accommodate DOS command-line parsing, GREP defines quite a few more escape characters like \c and \s, as well as numeric escapes. On the other hand, GREP regexes do not yet include the quantifiers ? and {m,n}, subexpressions (...), and alternatives |. They will probably be in the next release, which will be free to registered users.
 


What's New in 5.3?


GREP release 5.0 (May 2000) was a major overhaul. There were other improvements, but the biggest single change was the ability to search binary files. Release 5.1 (later that month) fixed a bug; release 5.2 (Jan 2001) made some minor improvements. The complete revision history is available as a separate document.

Release 5.3, 17 April 2001

New features:

Other changes:

Release 5.31, 18 April 2001

Unfortunately, a bug was introduced in release 5.3: under certain circumstances, GREP got confused about whether it was working from standard input or input files. This release corrects that bug, with my apologies to everyone who downloaded the buggy 5.3.

Release 5.32, 20 May 2001

If you specified current directory on another disk, such as "d:*.htm", GREP was taking that as root directory, "d:\*.htm". Apparently no one but the program author ever does such a thing!

Release 5.33, 19 Aug 2001

This is a repackaging for Simtel; there are no significant functional changes.