Ed is the standard text editor (2024-01-02)

,

I really enjoy Cory Doctorow’s writings. Lost Cause was great! I’m rereading Attack Surface. Cory is prolific, thorough, and deliberate. It is worth looking at his toolbox to get a glimpse of how he works. When I realized that he uses GNOME’s Gedit text editor I was surprised. How do you write a book every year plus hundreds of articles and use such a primitive tool?

image

I found the answer in the “Writing in the Age of Distraction” article published in the January 2009 issue of Locus magazine. Turns out Cory is so productive not despite of, but because of the choice of minimalist tooling. In the article Cory advises other writers: “Kill your word-processor” and “use a text-editor, like vi, Emacs, TextPad, BBEdit, Gedit, or any of a host of editors.”

Doctorow recommends these text editors because they are “some of the most venerable, reliable, powerful tools in the history of software (since they’re at the core of all other software) and they have almost no distracting features — but they do have powerful search-and-replace functions.”

Now that I understand the principles behind Cory’s preference for Gedit, I wondered if there is some other robust, OSS, UNIX tool out there that is even more minimal than Gedit.

ed came to mind. As it is well understood, ed is the standard text editor. Presumably, if you want to call your operating system a UNIX, you need to conform to the Single UNIX Specification (SUS). That would oblige you to include some version of ed in your distro. And so ed runs on everything — even IBM zOS.


undefined

ed was created by Ken Thompson in 1969 at AT&T Bell Labs. The first version was written in assembler. Later versions of ed were implemented in C.

In 1993 Andrew Moore was inspired by the “ed algorithm” as described in Brian W. Kernighan and P. J. Plauger’s book “Software Tools in Pascal” (Addison-Wesley, 1981) recreated the editor within the GNU ecosystem. The 1993 version is the one Apple includes in macOS as well. The 1993 version by Andrew Moore is what we’ll explore here.

Taking a cursory look at the source code - the editor is essentially 8 fairly small C files:

  1. buf.c - 6,682 bytes
  2. cbc.c - 10,066 bytes
  3. glbl.c - 6,107 bytes
  4. io.c - 8,486 bytes
  5. main.c - 33,050 bytes
  6. re.c - 3,721 bytes
  7. sub.c - 6,859 bytes
  8. undo.c - 4,313 bytes

Before we jump into main.c, which is the largest one (33K) let’s go over a few ed sample commands to get a feel of how this works. Much like vi, which originates from ed, ed has 2 modes: command and input. Once launched (ed article.txt to open a file) it is in command mode.

  1. 1p — prints the first line of the opened file
  2. 1i — inserts a new first line
  3. . — when in edit mode completes the text insertion
  4. ,n — prints all lines by prepending a line number
  5. 2s/foo/bar/g — replace foo with bar on the second line

Classic C opening:

/* ed: line editor */
int
main(int argc, char *argv[])
{

The comments are fantastic. The clarity of the comments rivals that of Lion’s Commentary. Here are the declarations of global variables as an example:

/* static buffers */
char stdinbuf[1];		          /* stdin buffer */
char *shcmd;			/* shell command buffer */
int shcmdsz;			/* shell command buffer size */
int shcmdi;			/* shell command buffer index */
char *ibuf;			/* ed command-line buffer */
int ibufsz;			/* ed command-line buffer size */
char *ibufp;			/* pointer to ed command-line buffer */

/* global flags */
int des = 0;			/* if set, use crypt(3) for i/o */
...
int sigactive = 0;		/* if set, signal handlers are enabled */
int posixly_correct = 0;	/* if set, POSIX behavior as per */
/* http://www.opengroup.org/onlinepubs/009695399/utilities/ed.html */

...
int lineno;			/* script line number */
const char *prompt;		/* command-line prompt */
const char *dps = "*";		/* default command-line prompt */

One interesting find — early into main() we see the top: label. And of course further down in the source is the "goto top; " statement. Many of us have been taught that GOTO should not be used. This started in 1968 with a letter by Edsger Dijkstra called “Go-to statement considered harmful”. But goto top; is here nevertheless. It seems fairly safe and unambigous as it is used in the context of parsing command line arguments.

After the buffer is initialized and the file is checked, there comes the infinite loop:

for (;;) {

Within the loop there is the inevitable invocation of exec_command() .

And as the comment indicates - this is where the command we have given ed is executed. One such example would be ",n ", which asks ed to print all lines with a leading line number.

The exec_command() function is one giant case statement. What does ed do when we enter ",n "? Let’s search for "case ‘n’: " in the code:

case 'n':
	if (check_addr_range(current_addr, current_addr) < 0)
		return ERR;
	GET_COMMAND_SUFFIX();
	if (display_lines(first_addr, second_addr, gflag | GNP) < 0)
		return ERR;
	gflag = 0;
	break;

The essence here seems to be the display_lines() function. Presumably by the time we detected the ‘n’ command, we had already collected the first_addr and second_addr , so now we know the begin and end lines of the range we want to print with line numbers.

The function display_lines() is tiny and a well structured one:

/* display_lines: print a range of lines to stdout */
int
display_lines(long from, long to, int gflag)
{
	line_t *bp;
	line_t *ep;
	char *s;
	if (!from) {
		errmsg = "invalid address";
		return ERR;
	}
	ep = get_addressed_line_node(INC_MOD(to, addr_last));
	bp = get_addressed_line_node(from);
	for (; bp != ep; bp = bp->q_forw) {
		if ((s = get_sbuf_line(bp)) == NULL)
			return ERR;
		if (put_tty_line(s, bp->len, current_addr = from++, gflag) < 0)
			return ERR;
	}
	return 0;
}

The function is given a from and to lines. It will print the range of lines between to and from and will show the line number for each one. In my case, when I entered “,n”, ed showed me the full contents of my test file:

,n
1       this is the new first line
2       this was the first line, but NOW it is the second one
3       two
4       three

Despite the excellent comments, in typical C fashion, we have some extremely short variable names. I am not a fan of single or even double-letter variable names. In some cases, when the pattern is well known it is acceptable, for example “for (i=0; … ” But in the case of bp and ep — we are left guessing what these may mean. Since bp is a struct for the “from” line and ep for the “to” line we could assume these perhaps mean:

  • bp = Beginning Point (from)
  • ep = Eng Point (to)

The loop "for (; bp != ep; bp = bp->q_forw) { " then is going to move the beginning point forward until it overlaps with the end point. That makes sense as we want to start with from and print all the lines until to .

The function "s = get_sbuf_line(bp) " will fetch a pointer to a null terminated string for each line we ‘step’ on. And put_tty_line(s…) would print it.

Let’s take a look at the printer:

/* put_tty_line: print text to stdout */
int
put_tty_line(const char *s, int l, long n, int gflag)
{

This is defined in the io.c file (we were in main.c until now).

The function put_tty_line() is generic and used by a few ed commands. So when we use ',n ’ - the first if statement will know that this is an ‘n’ and not just a ‘p’ (print without line numbers) and will prepend the line number for each line:

		printf("%ld\t", n);

The loop "for (; l–; s++) { " is where the line’s characters are printed. We don’t need a loop counter or index variable (i) here. Because of the single-letter variable, it is not crystal clear, but “l” in this case is the length of the current line we are working on. So our loop will continue as long as the value of l is greater than 0. The operator decrements l by 1 after each iteration, which is actually the output of a character to the TTY.

The several indentation levels of if statements will perform some minimal sanitation to ensure we don’t print control characters or stuff our TTY can’t handle and will eventually resort to putchar().

And putchar() is a C library function, which writes a character specified by the specified by the “int char” argument char to stdout.

The function will also make sure that we don’t go beyond the 72 columns as defined back in main.c

int cols = 72;                          /* wrap column */

There is so much more interesting and well written code in this tiny repo. The ed editor is truly a masterpiece. We will come back to this source code when we examine some of the modern editors and look at how these have been influenced by ed.

Happy Code Reading!


Appendix