Recently, I watched
this video of
Steve Yegge giving a talk at the
Northwest C++ Users' Group.
The title of his talk was "Open Scalable Language Toolchains", and in it he describes the project that he works on at Google.
Here is a copy of the abstract, and I'd like to highlight a few key points:
Modern IDEs and compilers generate a wealth of information, and you can't have any of it. Tools in the compiler family -- even the best IDEs -- tend to be monolithic, language-specific, generally non-scalable special-purpose applications. Even when they do support headless analysis, none of them do it the same way, and very few of them can do cross-language analysis. At Google I've put together a team with the long-term goal of addressing these problems in a general way. We've built infrastructure to run IDE-quality code analyzers such as Eclipse and clang over Google's entire corpus and all open-source code. We translate the intermediate representations into a language-neutral index, then serve the index data back through language-neutral APIs and query interfaces.
Steve says in the beginning of the talk that his project is for dealing with gigantic code bases, say 50 million lines or more.
A lot of gigantic systems involve multiple languages. Consider Android, which is a mix of C++ and Java. Note that in one slide he says:
-Consistency also enables cross-language analysis
-Analyze across RPC calls, embedded languages.
"Sort of the holy grail, but we'll get there", he says.
Yes, I believe they will, if they are not there already, and it will allow their software engineers to easily navigate across the boundaries between different languages.
The code indexing happens nightly on their distributed servers. Once generated, the index can be accessed from various different types of IDE clients. And the project is about more than just code browsing. It also enables
static analysis. It sounds like they've already built up a pretty neat static analysis query tool. So it's a pretty amazing and ambitious project, all in all.
I think it shows the emphasis that Google places on having quality tools. Facebook engineering manager Yishan Wong also
placed a high priority on tools; the top priority, in fact. Granted that his post is in the context of growing a small engineering team up to a medium engineering team, but I believe that tools should still be a very important priority, even in a large organization. Yishan's
other posts on engineering management are interesting as well.
The Importance of Code Browsing Tools for Software Development
A good code browsing tool will help you whether your code is well written or not.
If you've got a huge mass of poorly written code with a lot of duplication and a lot of extra
coupling, at least a good code browsing tool will help you navigate around more quickly, which should assist you (a little bit) in understanding the complexities of code.
If your code is well written, then that means it is well factored. Part of having well-factored code is reducing duplication. It's the old
DRY (Don't Repeat Yourself) principle from the Pragmatic Programmers.
Recently, Jake Scruggs
tweeted this:
Reducing duplication increases coupling which isn't always a good trade. #rubyconf
And that is true. Suppose that I have some duplicated code in a few classes. If I factor it out, I now have references (function calls) to the newly-unique code in each of those classes. Suppose that a parameter to that function now has to change. The coupling shows up because the unique function needs to change and all of the calling functions need to change. But consider what would have happened if I hadn't factored out that code. The code would have needed to change in each of the places where it was duplicated, which, on a larger scale, can be both tedious and error prone.
So, if your code follows the DRY principle, you have more coupling. You don't have duplication, which is good for maintainability. But you've broken the code down into lots of small pieces; pieces that are coupled together. Learning and understanding a code base like that can be initially challenging as well, since you need to figure out how all the little pieces work together. It's not as challenging as figuring out a poorly written code base, but it is still challenging. In order to figure it out, you need to navigate more and understand the connections and dependencies, and that's when having a good code browser really helps. So I can understand why this tool is so important to Google.