Little chunks of meaning

No "data" in a triplestore?

With apologies to Ford commercials everywhere: People ask me: "Michael, Why Semantics? Why now?" and I say to them . . .

"Brace yourself, that data guzzler in your mainframe . . . . . . just might be . . . . . . groaning under the weight of the data/metadata dichotomy.

I don't know if you've heard, but that information integration thing is a pretty big deal."

So how does the Semantic model, aka Linked Data Web help?

In the Semantic model each RDF triple carries 2 forms of metadata along with it (and/or pointers to that metadata, if not the real thing; if there even IS a "real" thing.)

First, each triple connects a subject with an object under the auspices of a predicate to make some primary "chunk of meaning," not just to "aggregate" 3 data elements, and assigning functionally specific roles to each component serves as foundational metadata from the git-go. In English we call such a "chunk of meaning" a "sentence", not knowing EXACTLY what a sentence is "semantically" (and usually not caring).

So . . . the first form of (somewhat implicit) metadata is that triplestores hold sentences, whose structure provides a foundation for their "meaning" within the sentence.

Second, most of the sentences in a triplestore are composed of pointers (URIs) to Web accessible resources that can or do hold descriptions of things being "discussed" by the triplestore sentences. The predicate pointers, in particular, can, do, and/or should describe the (meaning of the) relationships between subjects and objects being asserted by those predicates.

SOMEtimes the subject and object URIs are perverted to hold something that resembles "data" because data values are used in URI names (as in http://bio2rdf.org/go:0004003) or by including so-called "literal" values, such as "25^^example_namespace:int". However note that even this example literal has a URI embedded in it (in this case "example_namespace" is an abbreviation for a URI, as you remember from the Book Report), and the contents of the "pointed at" URI will explain how to interpret the "25".

(In fact, you could say that the "25" is there to serve as an offset into the supplied URI, as with something like http://example_namespace/int#25, but that encoding would require a large file to cover all the integers.)

So. . .the second form of metadata is that provided by the resources identified by URIs within triples, and/or the simple presence of a URI begging to be dereferenced.

To me, this unique combination of data and metadataused to encode information is qualitatively different from the "DATA storage model." I think, in fact, that this NOT a "DATA storage" approach and that triplestores (usually) do not even contain DATA!

SO what is being encoded if not data? "meaning"? "knowledge"? "linkages"?

If I didn't abhor the thought that this can be construed as "real knowledge in the human sense", I might go with "knowledge", in it's emaciated machine processing sense.

But I think I would rather just stick with "sentences," or "chunks of meaning".

Triplestores hold "sentences" encoded using RDF. They're NOT ABOUT DATA!

. . . and, IMHO, this is not just a matter of, as they say, "semantics". . .

PS. I actually think semantic encoding is quite odd (or maybe it just seems so because my mind has been perverted by the relational model), but this triplestore, or "sentence-base" seems to be awfully intuitive:

Smith has age 21. Jones has age 45. Blake has age 12. George has age 21. Smith has friend Jones. Jones has friend Smith. Blake has friend Blake. George has friend Smith.

with queries like

"Someone has friend Smith and "Someone has age 21"

to get "George".

And isn't this sentence-base pretty well documented if "has friend" and "has age" are actually URLs to web pages describing "has friend" and "has age"?

Of course, this approach is not without problems. It may for example be space inefficient, but some of that will be a function of the back-end storage model (possibly never seen by users). Most important, I don't see a well-developed syntax for detailed content search (due to the inability of SparQL processes to "understand" dynamically-defined types), so I'm going to guess that semantic approaches are not going to replace relational approaches.

Perhaps the semantic model will prove more useful for integrating extracts from large relational systems rather than for storing information de novo. . . hence Bio2RDF could be put together using a relatively TINY amount of time and effort compared to having developed a shared schema, etc. combining the 40 "data" collections that are said to have been included.

In place of unified relational schemae, groups would, however, have to agree upon the meaning of predicates for each data "field" to be integrated, and that could be complicated.

On the other, other hand, a triplestore like the one above admits rather naturally of processing by inference engines, whereas data in relational form are not so accomodating.