All posts by Marie-Eve Tremblay

Tough love: the programmer love-hate relationship with her languages

The Design and Evolution of C++For the first time since the wedding I was away from my husband, we were both visiting our family for Easter. He wasn’t there to “brainwash” me about his project, but convinced me a few days before leaving to read The Design and Evolution of C++ by Bjarne Stroustrup.

The motivations behind C++

I’m just starting chapter 2 (which is the third one, the first being chapter 0 of course). I found out that the motivations to create C++ were founded on this love-hate relationship he had with Simula, BCPL and others. Even though he doesn’t want to compare his creation to other modern languages, he doesn’t hesitate to criticize vigorously those he used before. Here are some excerpts: Pascal gets no love, but it isn’t clear about the others.

“I had found Pascal’s type system to be worse than useless – a straitjacket that caused more problems than it solve by forcing me to warp my design to suit an implementation-oriented artifact.
(…)
The feature of Simula were almost ideal for the purpose, and I was particularly impressed by the way the concepts of the language helped me think about the problems in my application
(…)
The implementation of Simula, however, did not scale the same way. As a result, the project came close to a disaster. My conclusion at that time was that the Simula implementation (…) was inherently unsuitable for larger programs. Link time for seperatly compiled classes was abysmal (…).

BCPL makes C look like a very high-level language and provide absolutely no type checking or run-type support.
(….)
I swore never again to attack a problem with tools as unsuitable as those I had suffered (…).”

Different state of mind, different languages

Some languages, like lisp, are creation of design and purity. These are interesting for the intellectual stimulation, but rare are the people that use them in real life (or happy to do so). Like a nice, but small, canvas in which you can only express a part of yourself. After all, lisp was design to express mathematical concepts and not to communicate with a machine.

There are also the “design by comity” languages. Composed by a team of designer, based on compromise, the result is a hybrid between different style. Having keyword for mostly anything is a big sign of design by comity because it is easy solution for a group of people to agree on. Usually the result can do a wide range of things but doesn’t perform well in any of it. COBOL and ADA are two examples, I don’t know anyone who have a really positive opinion about those.

I have the feeling that real life experiences forge language design oriented toward well… real life. When you have to face a wide variety of problems, went to the limits of your favorite languages, you develop a devoted love for the good things, and a growing frustration for those itches, especially if you have an idea on how to fix them. Programming language designers that falls in this category tend to create stuff people are more inclined to use (or love using).toughlove

My own love story

Under my last post, my friend Michel‘s comment was right. I am very harsh against garbage collection, and virtually everything counter performing. This is probably a professional deformation. After a few years coding with performance as a priority, it’s hard to quit. In fact I don’t want to quit, this has made me a better programmer.

My first serious experience with code was with C++. It wasn’t an easy gal, took a while to master the basic concepts. What really hook me up was the wide range of possibilities. I had frustration with the sometime non-intuitive ambiguous syntax, with the compiler’s bugs and performance, with ugly Microsoft stuff like MFC or COM; but I never felt limited in anyway. Controls, physic simulation, graphic, AI, photo processing; I went deep down many of those and the only limitation was the time I had to code. Would it have been easier to use, I would probably spend more time thinking, less time coding. But I still prefer flexibility and control over ease of use.

I loved my experience with Java when I was at university, easy to learn, quick to make simple stuff. But when I got into more serious programming, it couldn’t match my needs. I had outgrown the language. Mostly the problem was performance and control. But I also had some design interrogation, for example: why do I have to make a class even when it’s not the right design?

I always had a passion for assembly and down to the metal languages. You totally have the right to judge me crazy on this. I’m not sure why I love them so much. I agree with their reputation, hard to master and it takes a lot of time to do something with it, no matter how simple. They are extremely non-portable, hence I learned a few: x86, Motorola 680×0, but my favourite was VHDL. Programming directly the controller behaviour, managing directly the electric current, how much closer can a programmer get to the machine? You know you are low level when you have to manage the impact of the laws of physics on your program. While doing this slow pace programming, I developed an unconditionals love for the machine and its complexity. I now must reassure you, my husband is not as crazy as I am!

Impact on the C3 design

My huband programmer love story with C++ went a lot farther than me, metaprogramming with templates and stuff like that. The C3 design doesn’t compromise on performance for the exact same reasons that Stroustrup had when building C++. Similar background, similar conclusions.

Having multiple paradigms is probably the only way to allow programmers to fully express themselves. Like he said, each paradigm has its part to play, but it is their combination that really make thing lift off. Supporting a wide range of styles doesn’t have to make programming difficult. Nice design, behavior that default the correct way (that can get overridden should you need), simple and clean syntax: many elements that can give birth to something as easy as java to use (even easier), but as powerful as C++. Although, easy on the user side doesn’t mean easy in the compiler implementation.

Easy for me to say all this, nothing of this truly exists yet. But when we talk about it (everyday, even in bed before sleeping), I feel he thought of everything, now polishing on aspects I wouldn’t have imagined. I’m not usually easy to impress, I challenge him on every aspect I learn about. I hope this will help improve things. Unlimited passion… all this because we love and hate our programming languages. I feel my relation with C3 is going to be a wild one!

The value of values

or Why value-based programming is not a bad thing!

Working on values was implemented in C++ to maintain compatibility with C. Bjarne Stroupstrup also invented RAII, a method that ensure that resources are acquired correctly, and also properly released. This is more intuitive to beginners, and honestly, this is what we want to do most of the time; keeping pointers for when we really need them. c3wifegarbage

Values and references both have their uses and patching the nonexistence of value-based variables with a garbage collector makes your programs slow and less predictable. Even with the optimization made to garbage collection algorithms and the increase of computer performance, many applications need to minimize overhead and cannot afford to lose on something that can be done at compile time. There is also the predictability of a garbage collector. The ones I worked with had a non-deterministic nature, the memory gets cleaned up, but it’s hard to know the exact moment. Sometimes life depends on a program, some applications in space or health care cannot afford randomness. Serious programming often requires performance and predictability. It will be possible for someone to add the possibility to use a garbage collector, but it shall not be a necessity in C3.

Anyway, garbage collection doesn’t make your program leak proof, if you make some complex cycles (a pointer pointing on something that is pointing to the original), chances are that it won’t be detected, leaving a nasty leak while the programmer have been used not to care about those things. There are so many things other than memory that has to be managed, memory doesn’t have to be a special case. Examples: network connections or OS resource handles . Why assume the programmer is an idiot that needs to be protected from herself? Good design doesn’t prevail by limiting control, especially in a creative tool like a programming language.

The solution is to make the use of objects, both as value or reference, easy for general application and leave the door open for a meticulous management. Using value-based programming is easy in C++ and will continue to be in C3, thanks to RAII. Using pointers in C++ have a reputation of being complex, but it becomes an easy thing if you are using smart pointers. Most of the time, there is two ways we want our pointers to behave: non-copyable pointers, and reference-counting pointers. Non-copyable pointers simply disallow multiple pointers to the same object. Reference-counting pointers keep track of the number of pointers to the same object. Both know when the pointed object is no longer necessary and deletes it at this moment. No need for complex manual management! Another example is a clone pointer that makes a copy instead of pointing to the same item when copied. Other more custom pointers could also be used. This is where the serious programmer is free to push thing down to the metal.

Having smart pointers being part of the language standards allows making other simplification for the user. In C++, not redefining constructor, destructor, copy constructor can be a dangerous thing if your class contains pointers. But in C3, the default behavior will most probably be the one you need because the type of pointer you use define its behavior in those situation. Basic language functions won’t return raw pointers (like C++’s new operator). Other operators could be generated automatically correctly, like operator==, which isn’t the case in C++. My husband recently answered this question after seeing it on stackoverflow.

Why don’t C++ compilers define operator== and operator!= ?

IMHO, there is no “good” reason. The reason there are so many people that agree with this design decision is because they did not learn to master the power of value-based semantics. People need to write a lot of custom copy constructor, comparison operators and destructors because they use raw pointers in their implementation.

When using appropriate smart pointers (like boost::shared_ptr), the default copy constructor is usually fine and the obvious implementation of the hypothetical default comparison operator would be as fine.

So this is how memory is never stack as a pile of garbage, waiting for another process to collect it. It just manages itself in a very easy way and the serious programmer continue to be free to invent new ways to manage his pointers. Memory recycling itself without more efforts. The simple programs continue to be easy of implementation, and the programmer gets flexibility, performance and control for all possible applications.

Husband ‘s note:
C++ “concrete type” facilities were only optimized for small objects that are cheap to copy.
C3 makes value-based semantics easy and efficient, even for more complicated objects.

Is creating a programming language an extreme form of Not-Invented-Here Syndrome?

c3wifereinventingIn my last article, I posted a link to Joel Spolsky‘s “In Defense of Not-Invented-Here Syndrome“. To justify modifications to a compiler is something technical that can be demonstrated in a non-subjective way. Creating your own programming language, can hardly justified the same way since there are so many languages already existing. Still I’m coming in defense of the language creation, mostly by describing my husband motivations.

The first motivation to any write any code for which you aren’t paid is fun, including compilers. If design is fun to you, there is nothing wrong trying to design a programming language. In fact, the exercise will make you grow up. Trying is the better way to learn, programming language design and compilers are two of the best ways to get down to the metal, into the deepest layers of the machine. Understanding how things work will always gives you a heads up compare to learning workaround by memory. This is true for programming, but also when learning English or French, or when operating a microwave (you can learn what not to put in it, and maybe forget, or you can learn how it works, and deduce naturally what is a bad idea to put into it.)

Still there is more about C3 than having fun. It sure began that way, but my husband has outgrown the educational purpose. It all starts with a dream of making the perfect language. Off course reality catches up, the more you program, the more you realize the complexity of the task. But the more you program, the more you see space for improvement. There is an amazing amount of programming languages out there, some are really good and we have a lot of respect for them, especially C++. But they all have a bunch little something annoying (sometimes big) which leave the door open for improvement and new discoveries. Closing the door would be a very sad thing for the sake of creativity, but pushing the limit farther: that’s an interesting challenge.

Identifying what doesn’t work is one thing, but thinking about an elegant solution is very different. Over the years, he had been collecting problems in languages he was using, each time challenging his design. But the good designer also has an eye for good design, even when it’s not own work. Experience is what shaped this dream into the refined reality of C3.

So here we are, he doesn’t intend to reinvent the wheel, C++ is a big inspiration in his work and those familiar to it should feel totally comfortable with C3. But he’s making it evolve, mostly by making the syntax leaner. This doesn’t make the language more restrictive. On the contrary, simplifying the rules will leave more space for innovation. Alongside come ideas to make the programming more comfortable with better tools and a dream of a more unified community of programmer (and somewhat of a plan on how to make it happen).

Back to the question, is this a Not-Invented-Here syndrome? Probably… But this is how evolution works right? If everything was perfect, there would be no need to improve it. Only the most innovative persons challenge the very basics of things.

So mostly that’s it, a programmer should start designing his own programming language for fun and educational purpose. Some people, like my husband, discovers along the way that the world could benefit from their innovative ideas and want to push further. There is a long way ahead to make this actually happen. Hopefully he isn’t alone, a wife stands by him!

Why would someone build its own compiler?

c3wifecompilerneed1Many programmers live in a very fluffy world, where there is an API for everything, where a nice compiler takes all that nice code into something that the machine will interpret correctly. Others will go further, and hack their way through uneasy code. Only a few will come to the end of the darkness, where no compiler fits their needs.

I admit this isn’t a feature that is often required, but when you faced it once, you never want to go back again. Having to choose between making your own compiler and a big, ugly massive hack that you will regret for a long time, telling yourself “if only I had access to the compiler sources, I could change this one little thing and everything would be perfect”.

It happened to me once in my short career as a programmer, back when I was working on porting someone else’s application to a new platform. We were working on a library that handled the generic part of most of our tasks, not an unusual strategy. For each project, we received frequent drops of the client’s code, hence we wanted to minimize the changes to their code. In one particular project, we had a nasty crash, hard to reproduce: it took a while to happen, it seemed random, no specific reproduction step could be identified. The stack trace didn’t give much information of what it was, but we finally identified the source: the memory management system in the client’s code.

The client’s code used a memory pool, which is very common in these kinds of program. They defined a global new and delete in order to wrap everything into their memory managing system. This means everything we created in our library that used global new and delete (int, long etc.) was also getting managed by the memory pool. The buffers being at their limit, linking with our library was making everything explode. Blindly increasing the size of the buffer solved this problem, but created others. A brief analysis of upcoming projects shown us that a similar problem could happen again. We had to made our library “global new and delete proof”.

The clean solution would have been to create a new and delete specific to a namespace, but this is impossible to do without modification to the compiler. We didn’t have the budget or task force to buy or create the compiler, so we resign to another solution. We replaced everything with malloc, add a mention to the training program that new and delete were banned, and quit on integrating other libraries in which we couldn’t make this ugly modification.

There are many other reasons to build your own compiler. The Excel team at Microsoft had its own C compiler, allowing them optimizations not possible otherwise and a total control on their software. You can read more about this in this very interesting article from Joel on Software.

My husband want to make the compiler code open (the license has yet to be defined). Once he’ll bootstrap it, he’ll make it available to the world. An important aspect of C3 is that you should not find yourself limited by the language itself. You may need to work hard for something unusual, but everything should be possible (code-wise).

Finding good tools to make greater tools

Programming is usually something fun, but there is two types of situation that always made me mad:

  • Spending too much time finding a stupid syntax error because the compiler gives me a very weird error message and pointing to the wrong place in the code.
  • Having to hack my way out to trace a bug because the debugger doesn’t give me the information I need (or worse, gives me a false answer).

Mostly, as programmers, we want to spend our time creating, and not fighting against our tools.c3wifeinternalerror1

My husband was using XCode until recently. Having only mac at home, this was the obvious choice. But as the codebase continue to grow, he was getting more and more of those two things. Then the search for an alternative started. Codewarrior doesn’t support mac anymore. Anyway we already made an overdose of it in the past, back when we were mac game programmers! Then came Eclipse, he’s testing it now. Will he stick to it? Will he go back to XCode? Will something new arise? Or will he simply buy a PC and go back to Visual Studio and it’s comfortable Visual Assist. Eclipse debugger seems to be a little better, probably because it uses GDB/MI instead of directly parsing GDB’s output (more info). I expect support for C++ to grow very slowly for XCode, Apple’s attention being focused on C and Objective-C (a very similar fate to OpenGL because games weren’t a priority for Apple until recently).

No matter how good the tools are, the root of the problem is in the language grammar. The more hacks there is in your parser, the more difficult it will be to give clear error messages or debug correctly. Leaving retro-compatibility behind open the door to a design that would minimize those problems. In order to do that, my husband always keeps in the KISS motto in mind. A nice lean grammar implemented in a simple yet elegant way shall have lots of good influence in the future. This is a big part of why he didn’t go for YARD; sure he loves templates, but sometimes they are just too much for the needs.

Clear error messages and precise debugging tools are part of the big dream. It won’t prevent us from making stupid mistakes, but will sure save time when we do.

C3’s first sprint backlog

c3wifefirstbacklogLots of our friends, coworkers and cousins had a baby recently. Each of them have this picture of the baby at the hospital with a very cute hat written “My first hat” on it. There is this obssession with babies and the first time they get something done: first step, first word, first day at school, first whatever! C3 having no head or feet can’t have what other children have, but still have it’s first something. We have been talking about it for a while, but today mark the beginning of the first sprint and this comes with a very nice backlog. This is a task-list based on the scrum method, which is gaining popularity in the technology field. Obviously, since my husband is doing this part time we won’t put an end date to the sprint, but at the end, we intent to release something for you my fellow reader, in order to get your comments.

It was hard for me at first to define how to list this. In my husband head, it took the form of a recursive algorithm, while a backlog should be linear (without ifs or gotos) So here I’m unfolding this for you:

Sprint 1: Build a PEG parser with semantic actions that will be able able to regenerate ifself
-Parse PEG description file (without semantic actions) – Done
-Display concrete syntactic tree with the console – Done
-Build the abtract syntactic tree – In progress (Done but not tested)
-Display abstract syntactic tree with the console
-Create a generator of all previous steps
-Parse PEG description file with semantic actions
-Build the abstract syntactic tree with semantic actions
-Adapt the generator with semantic actions

This is a proof of concept. If the PEG parser generator can regenerate itself, (and again, and again) this will be a proof that the algorithm is working. After that he’ll concentrate on the next step: Update the C3 grammar with all the new thing he though about in the last few years!

Double-sided discovery

Yesterday my husband found out about YARD, yet another recursive descent. If you read the description in the article he send me this morning, you’ll find out it does mostly what I described in my latest post (PEG Parser). The problem is, he already have coded most of the features that exist in this library and there is no point redoing everything once again. Inspiration on the other hand is a very good thing.

The main problem is that YARD doesn’t fully support semantic actions and pegtl (another library based on YARD) doesn’t work with many compilers because it uses unreleased feature of C++0x.

This is another proof that the world needs a unified way to share code. YARD have been existing as early as 2004 and yet, he didn’t stumble upon it while searching with google in december. It took an article on Dr. Dobb’s to learn about it. The date: january 2009; five year after the first publication.

Funny thing, the author, Christopher Diggins, lives in the same province, not that far from our place. Almost looks like a trend in our area!

Now coding: parser generator

Thinking of great design is one thing, making it happen is a step fewer people are able to make. So here I’ll describe what have been done so far, what is currently being implemented, and why my husband chose these steps to begin with. The Spirit parser has been lost and wouldn’t be of any use anyway, so he’s making a fresh start. The first step into making a compiler, would be making a parser, which would transform code into an abstract syntax tree. But the first step into making C3 is to make a parser generator.

Currently used language compilers are often built around a context-free grammar. This is a concept directly taken from the Chomsky hierarchy, which describes the different types of formal languages. In the 2000s, Brian Ford gave a name to a concept that has already been floating for a while: the parsing expression grammar, or PEG. The Chomsky hierarchy describes how the sentence (or program) should be composed, while the PEG is going reverse, describing how to recognize a correct sentence of the language.

A context-free grammar usually leads to a bottom-up parser which is very complex. Two well known bottom-up parser generators are YACC (yet another compiler compiler) and Bison (GNU version of Yacc). They achieve a level of complexity that makes the generated code hard to understand and almost impossible to debug. On the other hand, there isn’t many top-down parser generators, probably because it’s a lot easier to directly write the parser which would be a recursive descent parser. PEGs are formal descriptions for this kind of parser. Spirit was described by its author as using modified context-free grammars but in fact it was using PEGs before anyone heard of them!

There are many advantages to use PEGs. The context-free grammar have to consider all possibilities at parsing time, and when facing two possible solutions, it cannot resolve itself without a hack. This is called a “conflict”. For example, many subsequent “if” are in the code, then comes an “else”. The context-free grammar doesn’t know with which “if” to associate the “else”, they are all possible solutions in the grammar definition. A hack is necessary to associate this “else” with the correct “if”. The PEG use a greedy algorithm with a priority to the different possibilities. It doesn’t need to be told what to do in all conflicting situation, it just take the highest priority. This makes the algorithm a lot simpler, and possibly more efficient. This is ideal for programming languages for which the main objective is to be parsed into something else, and the language can be designed based on this assumption. Chomsky’s theory of language was made to describe existing (natural) languages, it is normal that it may not be the most fitting tool to create a new one.

Another thing: when creating a context-free grammar parser, we need to have both a parser and a lexer (also called scanner). The lexer do a top-down operation on small pieces (numbers, variables, etc.) in order to let the parser handle the bigger chunks (instructions, functions, etc.) in a bottom-up way. When parsing top-down, no need for two tools, this is all handled in a single operation. The main reason why this can be done with PEG is that repetition is handled without recursivity. The more I learn about lexer, the more they feel like a big hack, good riddance, hi hi!

A funny thing is that Bjarne Stroustrup himself now regrets his choice of a bottom-up parser. At that time he was counseled by language and grammar specialists that this was the only way to go. But the implementation was so complex, he now thinks that a simple recursive descent parser would have done the job after all.

The type of parser the parser-generator is going to create is a packrat parser, which is a type of recursive descent parser. Recursive descent parsers have backtracking. That means that if the parser realize it took the wrong path because of its greedy nature, it will go backward an try the next solution in the priority table. With packrat parsing, if a subpart of the bad parsing finished correctly, it will be stored in a table to be reused in the following treatment if possible. This optimization is named memoisation.

So at this time, my husband is writing a PEG-packrat parser generator. This isn’t a necessary task, he could have just jumped into coding the parser itself. But having the parser generator will make things more flexible, and it’s also fun to implement (for him at least). There wasn’t any satisfying generator known when he started coding it, including Spirit which have poor compile-time performance and too much complexity coming from C++ templates techniques (after all, he wants to reimplement everything in C3 at some point). He now has a “PEG PEG”, which parse files of PEG description and is working toward PEG parser generator.

There are many concepts in this post that you may never have heard of. This is totally normal. I encourage you to follow the links and read about them. I don’t know if it will make you a better person, but this is all very interesting and essential to understand the fundamentals of programming languages.

The early days

The first ideas about C3 emerged while my husband was at the university. He was a first class student with very high marks even though he wasn’t the most studious. These led him to internship in major companies, but the inspiration didn’t really came from there, the dream was already in him before entering the university in year 2000. At that time, he wanted to created a language as simple as Visual Basic, but as powerful as C++. Learning more about C++ made the idea evolve into something else, the more he coded, the more ideas he had about a new language that would get the name C3 three years later.

The first step into reality was a C3 parser made with Spirit, which is a C++ compile-time parser generator based on expression templates. This parser takes a C3 file, creates the abstract syntax tree and outputs it in a file in XML format. This is somewhat of a proof of concept, a demonstration that the grammar works on practical examples. He acknowledge that this isn’t a full proof, because he haven’t shown that it works for all possibilities. This isn’t a priority anyway, many languages haven’t proved that either. This experience proved two important things. First, the grammar is well designed and there is no need for hacks to generate the abstract syntax tree (which is the case for many languages, including C++). And, he is a good enough programmer to handle such a project! I wouldn’t have married him if I wasn’t sure of that. Ah ah! At least, I wouldn’t be writing this blog.

There is a big void after this, he talked about it quite often, probably thought a lot about it, things continued to grow in this head, but no coding happened for a while. That’s when I met him, I don’t feel I was the distraction however. The company we worked for was very small, and even though we had fun, we worked many hours which didn’t leave much inspiration for coding at home. This would last from 2004 to 2008. The company changed a lot in this period, there are now more than 10 times the number of people than there was when we were hired. At the end of 2008, my soon to be husband completed a very important project and took a few weeks off, connecting with the Christmas vacation, which would make a full month of time to spare. In the second week, he started coding again, like this essential need to create C3, buried for too long, found the surface once again. He hasn’t stopped since, and now that he transfered to a more stable department, I feel confident he won’t stop. I hope that my vulgarization work will help him keep his motivation, this is my main goal in writing this blog (and maybe entertain you all a little, ah!)

Why the name C3

My husband told me last night the meaning of the name C3. He surprised himself not having already told me about this a long time ago. Of course, the notion of a third C comes to the mind, the first and second ones being C and C++. That’s all right and my husband think that it’s a worthy successor, one that keeps the spirit of its predecessors, at least more than other pretenders like C# and D.c3wifewhatisc3

You can call it “C three” if you want, but in fact the real pronunciation is “C cubed”. This notion of volume expresses the orthogonality of the language, the fact that all paradigms can be used in conjunction with the others. In fact Cn would have suited the language better, but still, the notion of volume says something very important: what is drawn inside simple axis is bigger than the sum of its basic elements.

For some times he thought about changing the language name, maybe break out with the C legacy. But when you called something by a name for that many year, it’s hard to get rid of it. I put an ultimatum on this when I bought my domain name. There is no intention to change it now.