Parsing Text Files Line by Line

When working with computer code, a task that comes up often is parsing text files. There are many reasons to do this. Some examples are parsing commands from an interpreted language, reading data from a dataset, reading configuration parameters, applying automated changes to code, and many more. Text files are generally human-readable and separated into lines, so line-by-line parsing is very common. Most programming languages have relatively easy ways to read a text file line-by-line. The following example shows two ways to do this in perl:

#!/usr/bin/perl

$file_name = “test.txt”;

# Read the file in-place

if (open my $file, “<$file_name“) {

# NOTE: The <> operator pulls the next line from a file

while (my $line = <$file>) {

# NOTE: $line still has the newline character

print $line;

}

close $file;

}

# Read the entire file into a variable. This can be faster than in-place

# but uses more memory.

if (open my $file, “<$file_name“) {

# NOTE: This changes the ‘$/’ global variable so that the <> operator pulls

# in the entire file.

my $file_contents = do { local $/; <$file> };

my @file_lines = split (‘\R’, $file_contents);

foreach my $line (@file_lines) {

# NOTE: This time, $line does not have a newline character

print “$line\n”;

}

Perl’s built-in regex features make it a very powerful tool for parsing text files. Perl scripts can easily search for lines that match a pattern, and perform some operation when a match is found. However, Perl is an interpreted language so it’s not suitable for projects that require large computations or real-time operation. These tasks are better handled by a compiled language like C. The following example shows one way to parse a text file line-by-line in C:

#include <stdio.h>

#define FILE_NAME “test.txt”

int main (int argc, char *argv[])

{

FILE *fin;

// NOTE: Line length is limited

char line[256];

/* Open the input file for reading in text mode */

fin = fopen(FILE_NAME, “r”);

if (fin) {

/* Read one line at a time until end of file */

while (fgets(line, sizeof(line), fin)) {

// NOTE: line contains the end-of-line character

printf(“%s”, line);

}

fclose(fin);

fin = NULL;

}

return 0;

}

The C code is a little more involved than the Perl, but not by much. The C language does not have built-in regex support so the actual parsing could be much more complex, but such parsing may be the only way to get required data into the application.

Complete Communications Engineering