Ancestors of C³

c3wifepyramidWe had a very slow week-end. Some friends came home saturday evening, and we went to sleep after too much food, wine, beer and video games. No wonder why it was so hard to get out of bed Sunday morning. So instead of getting things done, we spent some time together, talking about C³ on the pillow. Alex was telling me how existing programming languages were inspiring him and making things possible for him. He suggested I wrote something about it. I thought it was a very good idea.

C³ doesn’t intend to change everything, on the contrary, my husband want to take what is best in the already existing languages and bring that to the next level. Reinventing the wheel is not about starting from scratch, it’s about learning from the past, taking advantage of the tools you have, that your ancestors didn’t, and innovating at the same time.

Low-level inspirations

An inspiration I didn’t expect was LLVM. We have coded in C and C++ for so many years, those languages infused themselves into our mind. But LLVM has been in our live since so little time, and we haven’t really coded with it, and yet it seems to be an important source of solutions for problems he had earlier this year. Almost too good to be true.

The first thing that came to mind was that LLVM makes a perfect back-end. It fits the requirements: low-level enough to provide performance and control, and a special attention to performance as well. The C³ compiler code is planned to be flexible to support different back-ends, but it seems very likely that LLVM will be the first one.

But LLVM is not only a tool, its implementation contains some nice things too. The analysis and optimizations that LLVM makes on the SSA form (Static Single Assignment) is very inspiring. The algorithms behind this are very powerful and my husband want to adapt them for his own optimization system, or in short, apply them for high-level types. The list is very long, but to name a few: “Dead-code elimination”, “Constant propagation”, and “Combine redundant instructions”.

Good old C is also an inspiration. To be more specific, he really likes the way the abstractions of the language directly map on the different modern computer architectures. As he mentioned in his latest notes: the flexibility of a set of native abstractions (that may change between target machines!) may be the core element that will allow C³ to be the bridge toward the next generation of computers.

Syntax and types

I talked a lot about C++ as an inspiration. One feature of C++ that he wants to push beyond its current boundary is the ability for the user to create his own types. What he wants is to make user-defined types first-class citizens of the language. Int, float, long and double don’t have to be special cases. The language architecture is done in a way that allows every type to be used without additional limitations, optimized as much as intrinsic types would have been.

The C++ syntax is a familiar one but flawed in so many ways, especially the type declaration syntax. Bjarne Stroustrup admit himself that he kept it this way only for backward compatibility with C. It is a mess to parse and some contradictions have led to many debates: do I put my * with the pointee type or with the pointer name? But yet many languages like Java and D continue to use it as an inspiration for syntax, C³ as well. The reality is that the C++ syntax contains everything necessary for a good syntax and people love it! It just need a good clean-up. My husband describes the C³ syntax as a “naive” version of the C++ syntax, which means what you would think the syntax is intuitively instead of the one you discover in the field.

Multiple paradigms

Having multiple paradigms is a major aspect that my husband want to push farther. Being able to mix and match different programming styles with the same language is a rare quality that we love in C++. Some believe in purity of style, we believe that the right toolbox has a variety of tools that cover most situations, and even allows you to build your own!

One of these tools is generic programming. In C++, it is a powerful but complex style. Templates have allowed for this to be possible by decoupling the algorithms from the data structures they are processing. Inspired by the writings of Alex Stepanov and his work on STL, my husband intends to make generic programming something more intuitive, for example with the unification of static and dynamic polymorphism.

My husband often says that functional programming is an important part of C³ with the use of functions and closures. In C³ it is done with a “function” type that allows functions to be assigned to variable (that means support for closures as well). When I think about LISP, I think about parenthesis. But reading the essays of Paul Graham, I have to admit that functional programming can often be the best tool for the job, especially if the syntax is made convivial. An unexpected inspiration that has lots of potential.

Not only does he plan the unification of static and dynamic polymorphism, he is planning the unification of imperative and functional programming as well, isn’t it interesting…

So many…

Each of these elements would be worthy of a post by themselves, and there are so many other inspirations, aside from those we are not even conscious about! Great languages already exist and we have so much to learn from them. I find it very exciting to delve in this complex but so fascinating field that is programming language design. And just in case you worry about the impact of all this in our couple, we do talk about C³ often but we still try to get some things done, we are almost over with the wedding thanks cards. I know we are very late on this… Better late than never!

8 thoughts on “Ancestors of C³

  1. Hi MissTick and Alex!

    I find your posts very interesing. I’m working on a language with similar goals for my master thesis.

    I also considered LLVM but it created a 1MB executable to define a function that could multiply two numbers = (static linkage, but still). Right now I’m implementing an interpreter, I hope to use LLVM in the future.

    From what you have posted, I imagine C3 to be a language that can work in a lower level than C (-you can-, not -you must- though). In my opinion, very few problems (aside from some potential common-subexpression-elimination optimizations) remain for C++0x’s user defined types in terms of performance, could you elaborate?

    The conceptual way to unify GP and OOP is conceptually simple and Stepanov told it many years ago. The real problem resides on runtime overload resolution which is the ideal solution, but it must be fast enough to be worthy and in principle we would need to instantiate templates at runtime if we want to allow arbitrary use of objects which types are not known in compile time (unserializing objects from a data file and then calling some function, for example).

    Functions as defined in C can be (unfortunately) more efficient than function objects (when calling them directly, not via pointers) because function objects’ state can be stored in potentially any address, unknown in compile time. I’m still working on a good solution.

    Looking forward for more posts!

  2. Thanks for your very interesting comment!

    Was the 1MB executable created with the LLVM static compiler? Was it LLVM bitcode or a target machine executable? Was it compiled from LLVM assembly or C? It probably depends mostly on libraries that got automatically linked with it.

    “I imagine C3 to be a language that can work in a lower level than C” Yes! You’re absolutely right. However, as you said, there will be a big part of this potential that will be left unused for some time until there is enough people working on the C3 compiler and library.

    I know that C++0x adds rvalue references that eliminate some of the overhead of temporary variables, but it still requires the programmer to be very careful when using value types. For example, passing parameters by value is still costly. Also, the usage of rvalue references is far from being simple and intuitive.

    What I plan for GP, OOP and FP is to have a syntax that supports them all but always only support the most static behavior in the base language. More dynamic features will be supported by hooks to add support in C3 code. It means that I won’t need to support the general case and I will be able to gradually provide useful implementations along the static-dynamic continuum.

    In C3, there will not be a single representation for each language primitive. It means that functions that don’t require a state will be exactly the same as in C. Function objects will be defined by other types that will provide the more complicated call implementation. Moving forward along the static-dynamic axis, there will be something like boost::function that completely hide the underlying implementation.

  3. Hi Alex, thanks for your fast reply!

    The 1MB executable was a program that called LLVM libraries to define and execute a function in runtime (in my thesis I said I was going to implement an interpreter). It linked only the necessary libraries. Unladen Swallow (Google’s attempt to speed up Python) is using LLVM too but found it not very convenient for interpreting. Probably for compiling it may be better (Go didn’t like it either, but I still think its the best choice to get an optimizing compiler as soon as possible).

    I agree with your opinion about rvalues in C++0x. For my thesis I designed an alternative mechanism which is simpler and even faster than C++0x. I have not found any other important hole needed to be covered. However Stepanov has said that if we could say everything we know about a type, the compiler could apply a lot of other optimizations it only applies to primitives (well, those are the only types it knows). That may the future, but I consider it too complex for my taste at least in the short term. If something good enough gets done soon it may be great (C++ ‘axioms’ probably had potential but will have to wait some years (5 at least)).

    When I designed my GP and OOP unification (a simple one, doesn’t cover runtime template calling), I noticed there are too many places where the programmer would like to have control. C3 could be very useful there.

    I find it a bit weird to have a language which primitives may change. If I understand correctly, C3 is a language that covers everything a systems programming language may need? (no other language below C3 and above assembler) Some of the roles of compiler writers would be translated to the language. My language is not that ambitious (well, I need to implement it soon, otherwise I won’t graduate!).

  4. Go didn’t like llvm because it wasn’t as fast as a non-optimizing compiler. And for their optimizing compiler they chose gcc because that Ian already knew it well, so it was simpler for them.

    I’d expect to see a llvm-based go compiler some time soon.

  5. Wow, such great comments guys, it is really great to read them and it gives me interesting insight on the use of LLVM in the future.

    @Rodrigo, if you have a website on your project, I would be really interested to have a look at it!

  6. @foobar & @c3wife
    I highly recommend the following presentation:
    “Source Code Optimization” by Felix von Leitner (October 2009). They compare the optimizations applied by GCC 4.3, Intel CC, LLVM and Visual Studio and GCC seems impressive. Unladen Swallow finished their 2009Q3 release and they point LLVM was harder to use than expected, but I’m not sure using GCC is viable without a local expert around.

    @c3wife
    Still don’t 🙁 but I may have it in some months. Right know I’m coding the parser (LL(1) so its simple to do it by hand) and I won’t put too much effort in code generation as it will be an interpreter (no intermediate-representation–emit-dummy-asm-code-fast).

  7. BTW Unladen Swallow is an interpreter, so their comments may be biased to the interpreter mode of LLVM, which hasn’t received as much attention as the compiler!

  8. The feedback from Unladen Swallow about LLVM is very interesting. The good news is that they are also helping by discovering bugs. They also build community experience using it for something else than static code generation.

    Since I want to use LLVM (or something similar) both for compile-time meta-programming (that is, interpretation in the compiler) and static code generation, at some point both will need to be fast. However, I think the most important concern will be the generated code speed for those that will use C3 in its debuts.

    I read “Source Code Optimization” by Felix von Leitner right now.

Leave a Reply

Your email address will not be published. Required fields are marked *