Sunday, April 24, 2022

A Sidebar On Programming Languages

I’ve been around software a good long time now. Yesterday, I came across the following tirade about C that I want to address a few things about.  

C Isn’t A Programming Language Any More

I’m not going to go through the details of the author’s complaints and attempt to rebut them because there are several very fundamental issues in the assumptions being made that are more about the realities of software and hardware as it has evolved over the last <mumble> years. 

First, the headline itself annoys me, because it misses the point entirely.  C remains a programming language - and it has all of the hallmarks of one:  a defined syntax, semantics, and expected behaviours. C isn’t defined by the presence or absence of specific libraries, although a (more or less) standardized set of libraries eventually did evolve, and by the 1980s there was a fairly well defined “C Standard Library”.  

You can have all the complaints you want about the C libraries as they have evolved, but none of those complaints change the validity of C as a programming language. Most of the complaints about C in the article really focus on two areas:  the evolution of fundamental types in C, and the state of the C libraries. 

I want to point a few things out here. C has been around a long time, and its origins are actually quite specialized. It was created as the language to write what would become the first version of the UNIX operating system in. 

This fact is important.  At its core, C is a systems programming language.  It’s meant to be “one step above the hardware”, and it’s really damn good at it. The semantics of the first versions of C were tightly coupled to the hardware architecture that was emerging at Digital Equipment Corporation - and if you want to understand that, spend a little quality time learning about the PDP-11, an architecture that influenced processor design to this day.  (* For the yabbut crowd, I’m fully aware that the first version of UNIX was written for a PDP-10 - but that’s a story for another day *)

All of that takes place in the mid-late 1960s. 

Yes, a lot of C back then had “Implementation Dependent” in the definition from the outset - because actual implementations of C had to account for all sorts of very specific processor quirks - oddball word sizes - the idea of 8/16/32/64 bit words being standard is pretty recent -  different memory hardware architectures, and so on. So, C was always pretty loosely defined so that someone making a C compiler for a given processor could legitimately do so and have a reasonable expectation of being able to bend the core parts so that it worked for their needs. 

By the time ANSI started to define a standard for C in the 1980s, most processors were using 8/16/32 bit word sizes, and that got baked into the first ANSI standards. 64 bit designs were a fair ways off, so they sort of waved their hands at the subject and said “yeah, here’s a path to 64 bit types”, but it wasn’t particularly clear, and wouldn’t be until the emergence of the DEC Alpha chip in the 90s.  Even then, the DEC version of C left it a bit loose, and things didn’t really settle down until AMD released its 64 bit extension to the Intel architecture. 

Then there are the runtime libraries. The so-called Application Binary Interface (ABI).  The evolution of those is even more complex - in part because they spawned out of the development of UNIX, and UNIX needs drove them for decades.  Early ports of the standard libraries to MS-DOS and Windows were haphazard affairs at best, and then there were Microsoft’s own attempts to create their own versions (anyone else remember WinSock? *shudders*).  

Throughout time, the various libraries have changed and evolved, been ported across an enormous number of platforms - both hardware and software. That’s damned hard work to do, and inevitably introduces variation and complexity. I remember in the 90s porting a major application suite to HP-UX, and discovering that HP had badly botched their memory management infrastructure. Likewise, the release of Solaris in the mid-late 90s was a boondoggle of screwed up implementation issues in some very fundamental libraries. 

Bear in mind, both HP and Sun had excellent teams of developers working on their platform, and they still released some utterly appalling bugs into the wild. Porting complex software across platforms is damned hard work, and it’s even harder when you start playing around with changing fundamentals of the hardware architecture like machine word size. 

For better or worse, the “standard libraries” associated with C and UNIX have evolved over time, and sometimes that evolution has been messy (as it is in nature …).  There are parts of things that are ugly, and they probably could do with a major cleanup.  But undertaking such a cleanup isn’t easy either. You have to be willing to say what you’re willing to shed support for, and you have to be prepared to be ruthless when doing so.  

Right now, we live in an era where “UNIX-like” systems architectures running on Intel processors are commonplace.  But even that is changing, with the dominance of ARM architecture in mobile devices (and creeping into desktops as well with Apple’s A and M series system chips).  

But, coming back to the complaints of the article I mentioned previously:  C still remains a programming language regardless of the state of the libraries.  That doesn’t change because the definition of the language is separate from the libraries.  C remains what it always has been - a relatively low level language intended for writing operating systems. It was never intended as a language for writing applications in.

Has the complexity of the libraries escalated over time?  Yes - of course it has.  Is that painful in places?  Yup. Of course it is.  I spent much of my software career migrating a major chunk of software across different OS platforms - it’s hard work.  Yes, it’s frustrating at times when functionality changes, or when you discover that what you expected things to do isn’t what they actually do.  Get over it - we’re a long ways from having programming languages that are completely independent of the hardware and OS platforms - if that’s even possible.  

I suspect that you won’t get a “clean break” re-design anytime soon.  Even if an astoundingly perfect new hardware architecture were to emerge tomorrow, and require huge changes to actually work, the fact remains that the existing libraries still form a foundation of expectations that programmers are used to, and likely as not, that new chip would soon find the current libraries ported to it.  Inertia is a hell of a thing. 

No comments:

Is Google’s LaMDA Sentient?

The transcript of a conversation with Google’s LaMDA AI project escaped into the wild this week, and has created much buzz around the creat...