Thibaud explains… Abusing the C Preprocessor (IOCCC winner, 1986)

Thibaud Poncin
5 min readOct 19, 2020
Preprocessors need love too!

The IOCCC

Let me start by telling you about a wonderful *flinch* bunch *flinch-flinch* of people *flinch*: the IOCCC.

That thing, which is short for International Obfuscated C Code Contest, is a bunch of thick-glassed, scruffy-bearded, short sleeve shirts-wearing nerds that reunite (almost) every year since 1984 to elect the worst offender when it comes to writing absolutely incomprehensible (but technically correct!) C programs.

But since I’m currently wearing glasses, a beard, and since I don’t mind short sleeves on a hot summer day, and frankly, if you’re reading this, so should you, let’s proceed to the code from the winner of the 1986 edition, Jim Hague.

Evidence #1: The murder weapon

#define DIT (
#define DAH )
#define __DAH ++
#define DITDAH *
#define DAHDIT for
#define DIT_DAH malloc
#define DAH_DIT gets
#define _DAHDIT char
_DAHDIT _DAH_[]=”ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e’b.s;i,d:”
;main DIT DAH{_DAHDIT
DITDAH _DIT,DITDAH DAH_,DITDAH DIT_,
DITDAH _DIT_,DITDAH DIT_DAH DIT
DAH,DITDAH DAH_DIT DIT DAH;DAHDIT
DIT _DIT=DIT_DAH DIT 81 DAH,DIT_=_DIT
__DAH;_DIT==DAH_DIT DIT _DIT DAH;__DIT
DIT’\n’DAH DAH DAHDIT DIT DAH_=_DIT;DITDAH
DAH_;__DIT DIT DITDAH
_DIT_?_DAH DIT DITDAH DIT_ DAH:’?’DAH,__DIT
DIT’ ‘DAH,DAH_ __DAH DAH DAHDIT DIT
DITDAH DIT_=2,_DIT_=_DAH_; DITDAH _DIT_&&DIT
DITDAH _DIT_!=DIT DITDAH DAH_>=’a’? DITDAH
DAH_&223:DITDAH DAH_ DAH DAH; DIT
DITDAH DIT_ DAH __DAH,_DIT_ __DAH DAH
DITDAH DIT_+= DIT DITDAH _DIT_>=’a’? DITDAH _DIT_-’a’:0
DAH;}_DAH DIT DIT_ DAH{ __DIT DIT
DIT_>3?_DAH DIT DIT_>>1 DAH:’\0'DAH;return
DIT_&1?’-’:’.’;}__DIT DIT DIT_ DAH _DAHDIT
DIT_;{DIT void DAH write DIT 1,&DIT_,1 DAH;}

No other word for it I’m afraid, that thing is absolutely terrible. Believe it or not, this code translates a string you give it in the stdin into Morse code. Don’t believe me? Let’s compile it and try it out, ignoring the terrifying-ish warnings you get when compiling ancient K&R spec code with an moderne compiler expecting ANSI C or newer.

That ugly-ass code does something after all!

Ok… so how does it work? Wcan see a bit clearer through this by getting rid of all of our preprocessor macros with a quick gcc -E on our file, which gives us this:

char _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:"
;main ( ){char
* _DIT,* DAH_,* DIT_,
* _DIT_,* malloc (
),* gets ( );for
( _DIT=malloc ( 81 ),DIT_=_DIT
++;_DIT==gets ( _DIT );__DIT
('\n') ) for ( DAH_=_DIT;*
DAH_;__DIT ( *
_DIT_?_DAH ( * DIT_ ):'?'),__DIT
(' '),DAH_ ++ ) for (
* DIT_=2,_DIT_=_DAH_; * _DIT_&&(
* _DIT_!=( * DAH_>='a'? *
DAH_&223:* DAH_ ) ); (
* DIT_ ) ++,_DIT_ ++ )
* DIT_+= ( * _DIT_>='a'? * _DIT_-'a':0
);}_DAH ( DIT_ ){ __DIT (
DIT_>3?_DAH ( DIT_>>1 ):'\0');return
DIT_&1?'-':'.';}__DIT ( DIT_ ) char
DIT_;{( void ) write ( 1,&DIT_,1 );}

…and already that looks a tad more palatable. Let’s clean this up by removing unneeded spaces, reindenting the code and reordering a few unearthed functions we’re about to unearth in this mess.

_DAH(DIT_)
{
__DIT(DIT_ > 3 ? _DAH(DIT_ >> 1) : '\0');
return DIT_ & 1 ? '-' : '.';
}__DIT(DIT_) char DIT_;
{
(void)write(1, &DIT_, 1);
}
char _DAH_[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:";main()
{
char *_DIT, *DAH_, *DIT_, *_DIT_, *malloc(), *gets();for(_DIT = malloc(81), DIT_ = _DIT++ ; _DIT == gets(_DIT) ;
__DIT('\n'))
for (DAH_=_DIT; *DAH_; __DIT(*_DIT_ ? _DAH(*DIT_) : '?'), __DIT(' '),DAH_++)
for (*DIT_ = 2, _DIT_ = _DAH_ ; *_DIT_ && (*_DIT_ != (*DAH_ >= 'a' ? * DAH_ & 223 : *DAH_)) ; (*DIT_)++ , _DIT_++)
*DIT_ += (*_DIT_ >= 'a' ? *_DIT_ - 'a' : 0);
}

Qui va piano va sano e lontano. We seem to be on the right track to end up with understandable code eventually.

That __DIT function looks a lot like it’s writing a char to stdout. Let’s rename it _putchar for better legibility. While we’re there, the other function looks like it’s the one doing the actual translation into morse (doing so recursively, yay…). Let’s say this one is _translate. We’ll rename a few variables at the same time. _DAH_ becomes ascii_arr.

_translate(DIT_)
{
_putchar(DIT_ > 3 ? _translate(DIT_ >> 1) : '\0');
return DIT_ & 1 ? '-' : '.';
}
_putchar(DIT_) char DIT_;
{
(void)write(1, &DIT_, 1);
}
char ascii_array[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:";
main()
{
char *_DIT, *DAH_, *DIT_, *_DIT_, *malloc(), *gets();

for(_DIT = malloc(81), DIT_ = _DIT++ ; _DIT == gets(_DIT); _putchar('\n'))
{
for (DAH_=_DIT; *DAH_; _putchar(*_DIT_ ? _translate(*DIT_) : '?'), _putchar(' '),DAH_++)
{
for (*DIT_ = 2, _DIT_ = ascii_array ; *_DIT_ && (*_DIT_ != (*DAH_ >= 'a' ? * DAH_ & 223 : *DAH_)) ; (*DIT_)++ , _DIT_++)
{
*DIT_ += (*_DIT_ >= 'a' ? *_DIT_ - 'a' : 0);
}
}
}
}

Ok that’s looking a lot better now.

Let’s get away from the code for a second and look at the Morse code binary tree. Morse encodes letters into dots and dashes, and you can follow this tree to find the match for every single letter of the alphabet. Every time the tree branches left, add a dot. Everytime you branch right, add a dash. Following this, E is ., A is .-, and X would be -..-

Wait a second. OH MY GOD LOOK AT IT!! That’s our ascii_array !!

If we read this tree from left to right, top to bottom, we get ETIANMSURWDK… exactly the same string from our code, with 'a' in lieu of spaces. That’s a clue towards understanding how this code works.

_translate(c)
{
_putchar(c > 3 ? _translate(c >> 1) : '\0');
return c & 1 ? '-' : '.';
}
_putchar(c) char c;
{
(void)write(1, &c, 1);
}
char ascii_array[]="ETIANMSURWDKGOHVFaLaPJBXCYZQb54a3d2f16g7c8a90l?e'b.s;i,d:";
main()
{
char *source_string, *string_copy, *c, *ascii_copy, *malloc(), *gets();

for(source_string = malloc(81), c = source_string++; source_string == gets(source_string); _putchar('\n'))
{
for (string_copy=source_string; *string_copy; _putchar(*ascii_copy ? _translate(*c) : '?'), _putchar(' '),string_copy++)
{
for (*c = 2, ascii_copy = ascii_array; *ascii_copy && (*ascii_copy != (*string_copy >= 'a' ? * string_copy & 223 : *string_copy)) ; (*c)++ , ascii_copy++)
{
*c += (*ascii_copy >= 'a' ? *ascii_copy - 'a' : 0);
}
}
}
}

And this will be the code in its final form.

The first of the 3 nested for loops picks up the string typed into stdin and stores it into source_string. The second loop writes each char of the string in morse after calling the translate function on it (with real bits of bitwise wizardry in it), while the third loop translates lowercase chars to uppercase.

Et voilà!

--

--