18

For example if I have a class like

class Foo {
    public int bar;

    public Foo(int constructor_var) {
        bar = construction_var;
    }

    public bar_plus_one() {
        return bar++;
    }
}

Foo foo = new Foo(2);

and in the IDE I type foo.ba I get bar suggested, or if I type String x = foo.bar() I get red squiggles. How does the IDE become context aware? Is there a code querying language, is it reflection, what?

To clarify my question a little, I am asking because I want to be able to query my code base. I am looking for a tool where I can (essentially) say SELECT name FROM methods WHERE signature IS 3 ints or something like that. I figure whatever something like Intellisense uses to make suggestions is where I should be looking.

Deduplicator
  • 9,209
ewhiting
  • 329

4 Answers4

29

As a very high-level overview, the IDE contains a compiler. (Well, most parts of a compiler: it doesn't need to generate code or optimize, but all the rest is there, lexing, parsing, semantic analysis, type inference, type checking, macro expansion, symbol resolution, etc.)

From the information gleaned from this analysis, the IDE constructs a semantic model of the code, and then, when it encounters incomplete code, it uses sufficiently advanced magic to figure out how best to complete it. (A simple algorithm would be to offer the shortest possible completion, but normally, IDEs are much more sophisticated than that.)

Because of the code duplication between IDEs and compilers, in recent years, there have been efforts to integrate the two. E.g. Microsoft's Roslyn compiler for C# and Visual Basic.NET was explicitly designed with APIs that allow an IDE to access all the required information. Likewise, the nsc (New Scala Compiler) and dotc compilers for Scala, and the Clang compiler for C / C++ have APIs for embedding into an IDE.

Note that the compiler built into an IDE has some different requirements from a classic batch compiler: it needs to be asynchronous, reactive, concurrent, fast, incremental, and most of the time, the code it deals with will be incomplete, invalid, and have errors. However, even despite these conflicting requirements, it makes sense to merge the two into one, because this guarantees that the IDE and the compiler always have the same understanding of the code.

As a counter-example, IntelliJ IDEA uses its own compiler framework. IDEA uses a single language-agnostic semantic graph for the entire project, no matter how many different languages are used within the project. This allows it to have really cool features such as automatically converting code between different languages, or refactoring across languages in a polyglot project. But, it runs into precisely the problem I mentioned above, especially often with Scala, where IntelliJ shows errors for code that actually compiles fine with the Scala compiler or vice versa.

Microsoft has developed the Language Server Protocol, which is an API that allows IDEs to communicate with compilers using a standardized protocol. This means that compilers that implement the LSP will automatically work with every IDE that implements the LSP, and likewise, IDEs that implement the LSP will automatically support every language for which a compiler exists that implements the LSP. Nowadays, lots of compilers (e.g. the tsc TypeScript compiler, Idris, Scala) and IDEs (Visual Studio Code, Emacs, Vim) implement it.

In the same vein and based on the success of the LSP, there is now an effort by the Scala community to define a Build Server Protocol that allows IDEs to abstract over build tools (SBT, Maven, Gradle, Mill).

Addendum: Everything I wrote above, applies to "good™️" IDEs with semantic features. There are much, much, simpler IDEs that, for example, simply offer every word (even from comments) of the current file as completions, regardless of whether that word is even syntactically legal in that context.

Jörg W Mittag
  • 104,619
17

The IDE understands the code. It is able to parse it and extract all the necessary information for autocomplete, like what classes are available, their names and all their members. The IDE team most probably had to implement this parsing themselves, or use private APIs in the compiler.

And compilers do the same thing as their main function. The compiler builds a representation of the codebase for its own use. But compilers rarely expose that information to the outside world. So if you want to query your code, the most likely scenario is to implement your own parser which might take lots of effort, depending on the complexity of the language.

But if your language is C#, then you are in luck. Over the last few years, the C# compiler team put effort into exposing just that information from their Roslyn compiler. So getting something like SELECT name FROM methods WHERE signature IS 3 ints is as trivial as importing a NuGet package, loading a code and writing a LINQ query (demonstration).

CJ Dennis
  • 669
Euphoric
  • 38,149
7

How does the IDE become context aware?

Code completion and refactoring features are generally implemented by building an abstract syntax tree from the source code. The nodes of the AST represent things such as variables, operators and method calls. When you type foo., the IDE uses the AST to resolve the variable foo to the type Foo and then displays a list of members from that type.

I am asking because I want to be able to query my code base.

The simplest way to do this would be to write a plugin for your favorite IDE. Most IDEs expose the AST through their plugin APIs and make it easy to build new code analysis tools by leveraging their existing infrastructure.

casablanca
  • 5,014
1

Your IDE may use an external plugin/tool to do auto-complete. For example with vim and C or C++ you may use clang-complete, which as the name suggests makes use of clang's ability to suggest code completions given a source file. For python your IDE (such as vim, VSCode, Atom, Emacs, Sublime, Gedit, ...) might use as an example jedi-vim, which uses the library jedi's autocomplete features. These external tools can be used within or without the IDE.

qwr
  • 342