6

I have seen several applications that claim to convert Java code to valid C or even C++. Converting from a high level language to a low level language is possible, no doubt about it. At least in theory, can the reverse be done without any manual steps?

For instance:

  • Converting Assembly to C or Machine Code to Assembly?

  • Hardware Description Languages (HDL) to Assembly? (which ever is lowest?)

  • C to C#?

4 Answers4

9

Although it's possible, it's likely that the "lifting" compiler will end up generating code whose structure emulates the programming model of the lower-level language. Thus, you will end up with "COBOL in Haskell" or "ASM in Java" or what-have-you, and it will be more complex and less efficient than your lower-level language.

For instance, if the lower-level language has explicit memory management and your higher-level language does not, you cannot just throw away the frees -- perhaps the behavior of the underlying program depends upon determinism. So you would have to model, in your high-level language, the memory model of the low-level language (yuck). Similarly, if the lower-level language has arbitrary gotos a la JMP you'd have to generate high-level code where such gotos could be executed (arbitrary function boundaries).

The reason that decompilers do not face this problem is that they are not really working with the complete capabilities of the underlying machine code unless they are on a VM that happens to be very tightly coupled to a language's programming model.

Larry OBrien
  • 5,037
3

Theoretically it's possible to write a compiler from any Turing complete language to any other Turing complete language.

Practically, going from a lower level language to a higher level one will be highly problematic, as you are going towards higher abstraction and that usually requires humans. Think of how there isn't a "correct" approach in object orientation...

For similar level languages, it's a little bit easier, checkout code2code which translates C++ code translates C# & VB.NET. And since some C++ code is valid C code, you can say that it also translates C to C#, to a certain extend.

yannis
  • 39,647
3

There is an important difference between converting to a high-level language from something that has been written manually vs. from something that has been generated automatically.

In the first case, little, if anything, is there to direct your reverse translation, so your translator will be "writing Fortran programs in (some other) language".

The second case is different, though: compilers leave behind enough "marks" to make reverse translation a possibility. For instance, you can examine binary code generated from C++, and figure out a lot of things about the classes from which the code has been generated:

  • You can learn the layout of the fields in a class by examining code that accesses a class
  • You can find virtual functions of the class by examining vtables
  • You can find member functions by un-mangling the names from the .o files
  • You can make educated guesses about constants that were defined in a class
  • You can translate expressions back to a human-readable form, perhaps with fewer or more sets of parentheses
  • You can detect uses of common STL containers expanded from templates

Granted, the result will never be identical to the original source, because nice things such as comments and names of local variables are irretrievably lost. But you would definitely get something better than a piece of assembly written out using the C++ syntax.

0

Not much. Translating from a higher-level language to a lower-level language means re-implementing all of the higher-level language features using only those features available in the lower-level language. This is basically what a compiler does. Eg. one line might become many, many assembly language instructions.

If you're translating from a lower-level language to a higher-level language you can either (a) just implement exactly the same program in the higher-level language -- eg. translate a compiled machine language program into a sequence of python commands for "model registers using these variables, load this value into this register, add these registers, store this value, jump to line xxx" etc etc which is pretty useless. (For instance, almost all C programs are already valid C++ programs, or very nearly so, just without using any of the features that making using C++ useful).

Or (b) try to guess what original language features were translated into the lower-level language. If the lower-level language was originally compiled, this can be somewhat successful: the decompiler looks for the sorts of code compilers usually generate, and guesses what the original code may have been. An example picked off google: http://boomerang.sourceforge.net/cando.php.

However, that only applies if it was originally compiled from that higher level language using a known compiler in the first place (or written by hand using many idioms which have corresponding language features).

If you have a pile of C code and you want to convert it to C++ code, you need to make value decisions about which bits of the code will be extensible, which functions should be grouped into classes, how to avoid global state, etc, etc, which is what programmers do, and can't be immediately automated.

Jack V.
  • 1,130