Is the Semantic Snowball on the roll?
I should probably just lie low after my last foray against Sir Tim's ivory tower, but this begs for some pushback:
Berners-Lee hopes that life sciences will drive adoption of the Semantic Web, just as high-energy physics drove the early Web.
"Maybe we will meet a critical mass
in a certain area. The Web, for example, took off in high-energy
physics. When we got six high-energy physics Web sites, then it got
interesting for physicists to be onboard," he said. "Similarly, if we
could get critical mass in life sciences, if we get a half a dozen or a
dozen set of ontologies, the core ones for drug discovery out there,
then suddenly the Semantic Web within life sciences would have a
critical mass. It’ll snowball much more rapidly and it will be copied...
Let's compare timelines for
how the plain ol' Web achieved critical mass and how the Semantic
Web has failed to do so. The Web was proposed in March 1989. A comparable date for the Semantic Web might be the September 1998 Semantic Web Road map,
not quite 7 years ago. Seven years after the Web was proposed,
the .com bubble was swelling on Wall Street and Microsoft had announced a fundamental realignment to focus on the Web rather than proprietary networks. That was
an example of a critical mass snowballing (to mix the same metaphors
that Tim mixed). I don't see more than a few flurries of a snowball
effect for the Semantic Web today, despite years of non-stop
evangelizing and commitment of a large portion of W3C's resources.
That doesn't mean that semantic technologies of the sort the W3C has
developed are useless, just that they will be used primarily in niches
that are defined by adoption of an existing ontology. The bio-medical
industry may indeed be one niche where the semantic technologies can
flourish, because researchers and practicioners have invested a few
hundred years in coming up with a controlled vocabulary based on solid
scientific understanding of fundamental biological facts. These
ontologies can AFAIK be represented and manipulated using the
technologies of the Semantic Web. To use the example that roused me from my dogmatic skepticism a few years ago, one can leverage a scientifically sound controlled vocabulary such as SNOMED to formulate queries such as:
Of all the patients I operated on for brain tumors between
1996-2000, matching severity of pathology and matching clinical status and
who have the "P53" mutation, did PCV chemotherapy improve the cure rate at five years?
This would be close to impossible with SQL or XQuery -- one would have
to either have explicit markup for generic terms such as "brain tumor",
"severity of pathology", etc. in the data, or one would have to
explicitly handle all the matching terms in the query. Having a
controlled vocabulary/ontology that defines things like "neuroblastoma
IS-A 'brain tumor'" can let users deal with the generic terms and let
some inference engine sort out the details. Still, I can't
imagine that this snowball will roll past the community of people who
actively use a controlled vocabulary for medical terms, nor will it
address more than a handful of the real problems of the life sciences
industry. Powerful, consistent ontologies are rare today because they
are just so hard to build. Efforts such as Cyc to apply this basic idea to everyday life have been stuck at the proof of concept stage for about 20 years now.
The Web snowballed down a wide, smooth slope
because it's basic content was existing or easily-authored
human-readable text, and the barriers to wider use of the content were
mechanical and easily overwhelmed by the snowball. The Semantic
Web must contend with a lot of trees and ravines -- the fragmentation
of human knowledge and its resistance to formalization -- that
stop or divert the snowball before it gets much size or momentum.
If they exist, the Semantic Web will handle the mechanics.
If they don't exist yet, RDF inference engines, SPARQL databases,
or OWL ontology editors can groom the slopes a bit, but they
won't knock down the trees.