About 25% of a developer's time is spend searching for information. It's well spent, though -- finding reusable code can get a project done on time and with high quality results.
By Ken Krugler, CTO and John D. Mitchell, Chief Architect, LinuxWorld.com
Wait! Keep reading! This is not yet-another-methodology that promises to solve all of your programming problems. What we'll be discussing in this article is why search has become a critical tool for developers.
Search driven development (SDD) is an easy label that we can put on a simple fact of life in modern software development: searching for technical information is a large, costly part of creating and using information technologies.
Let's take a real world example. Hari Jayaram is a postdoctoral researcher in biochemistry. Like all developers, he uses code written by others -- in his case, the seqhound bioinformatics API from Unleashed Informatics. And, like all developers, his code didn't "just work" the first time he ran it.
So now what? A typical developer will do things like:
* Look for additional documentation on the API.
* Read newsgroups for people having the same problem.
* Search the company's site for help with the API.
* Or, as in Jayaram's case, search for code examples where other people successfully used the API.
What all of the above approaches have in common is that they involve search as a way to find the information needed to solve the problem at hand.
OK, you say, search helped make Jayaram happy but that's not how *I* program. Heck, I don't even use open source code and I can grep my own code so what more do I need?!
Good question, and one which we'll answer five different ways.
Reason No.1: We're already searching every day
As we mentioned above, searching for technical information is a large, costly part of creating and using information technologies that we are already paying whether we like it or not.
So how "large and costly" is all this technical searching? Well, recent independent research says that developers spend about 25% of their time just searching for information. No wonder people worry about a 'software crisis'! Between too many meetings, bad requirements, and hunting for useful information, we're stuck working long hours and still feeling like we haven't accomplished what we need to (let alone all that we really want to).
Some folks, often managers, are skeptical that developers really spend 25% of the time looking for technical information. If we break it down even a little bit, the number often seems quite low. For example, how much of our time doing debugging is actually looking for answers in FAQs and traditional documentation, or searching blogs for people who have already run into the same problem and figured out work-arounds, or looking through mailing list and newsgroup archives for hints, or even trawling bug databases in the hope that a newer version will magically fix the problem? How much of our initial development time is spent looking for examples of how to use some module? How much of our maintenance time is spent trying to find the right spots to update or looking around to remember why we did things in certain ways?
In some ways, search has always been a key component of software development: cut-and-paste usually involves search-and-replace; local function/method indexing helps you with faster navigation; the myriad variations of grep help you sift through files; we use the 'find' utility because our directory hierarchies never quite live up to our needs; and we look around the big, centralized code archive sites and use the common text-based search engines.
Reason No. 2: The expanding universe of technical information
How much "technical stuff" is out there for programmers? Some rough estimates around significant sources of useful information indicates that there are:
* 100,000,000 technically oriented Web pages
* 20,000,000 source code files
* 100,000 active open source projects
* 5,000 technical books totaling more than 2 million pages
That's a lot of "good stuff" and it's growing quickly. Heck, notwithstanding the existing ocean of open source code, there's a tidal wave of code being publicly released by companies as they continue to align themselves with the open source movement.
Back in the day, you could get a CD from your platform vendor of choice, maybe buy a book or two, and you were good to go. That's not true anymore, as public standards, open source and the Internet in general have contributed to a vast expansion and scattering of technical information.
Reason No. 3: You can't reuse the code if you can't find the code
While there have been many attempts to foster code reuse at various levels of scale, reuse in the large has generally failed for three reasons: attitude, integration, and location.
The attitude problem is easily summed up by the phrase "not invented here". That is, we all think we're better, faster than "those other idiots" so we'll just do it ourselves.
The integration problem is that there is a cost in integrating somebody else's code into one's own system.
Search doesn't solve these first two problems, which we've only briefly touched on above. But, search can make a dramatic difference in the location problem: how do we reduce the cost of finding code that we could potentially reuse? Especially if the code is not already on our hard disk and inside of our development environment.
But, one may ask, what about all of the open-source libraries and frameworks floating around that are actually being used? Sure, there are popular libraries and frameworks that do get widely used. However, dig under the hood a bit and it's easy to see that there is a lot more code that is out there that could be getting used that is hidden because it's so hard to find.
In other words, the popularity of a project is the main way that most developers find code to reuse today. Alas, we have all run into the harsh reality that popularity doesn't automatically mean the code is good or that we can easily figure out how to install it, use it or fix it. Good search helps a developer find the best available solution, not just the top handful of well-known projects.
And even if you aren't using open source, a typical company that has more than one small room of developers has the same kinds of issues, just on a smaller scale. Somebody has often used the same API, solved the same problem, and run into the same bug...and just don't know it.
Reason No. 4: Just-in-time comprehension
Manually reading all of the sources of potential information isn't practical. Search is the only solution that has been found to work across the wide range of scales that we live with -- from individual projects to corporate divisions to the Internet. To be efficient, we really want highly effective search that can be used when needed to find the answers "just in time".
In fact, the researchers who coined the term "just in time comprehension", Timothy Lethbridge and Nicholas Anquetil, have shown that developers overwhelmingly work "just in time". Tellingly, existing development tools are still stuck in the old, heavyweight fantasy of the waterfall methodology.
Reason No. 5: Code-centric search engines
In recent years, code-centric search engines have begun to gain traction. Examples include Google's code search, Koders, Codase, Codefetch, and Krugle's code channel. In various ways, these code-centric engines allow searching through actual source code. The weaker solutions provide little more than traditional text-based searching over files that are only code while the stronger solutions are built to actually understand code as code. That is, they parse code as programming languages and so can understand the difference between comments and code, function definitions and calls, and one language from another.
Rather interestingly, code search all by itself doesn't solve the whole problem. We need all of the technical information around and about the code to be able to really fly. Code-only searching misses out on the fundamental strengths of the communities surrounding any code that is truly worth using. For instance, the best examples of how to use some piece of code is often embedded as a small code snippet inside a magazine article or blog entry or, occasionally, even in the official technical documentation. Of course, we are a wee bit biased in this thinking because we have developed just such a search engine for developers: Krugle.
Summary
So, is search-driven development a pie-in-the-sky promise of developmental nirvana? Saving two or three hours a day that can be used catching up on sleep, doing other valuable development tasks, playing WoW, working on open-source projects, or, heaven forbid, even enjoying the rest of life might not be nirvana but it sure sounds awfully good.