The distributed storage system for managing structured data called Bigtable resembles a database sharing implementation strategies with parallel and main-memory databases. Instead of a full relational data model it uses a simple data model with data indexed using row and column names that can be arbitrary strings. A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes, although clients often serialize various forms of structured and semi-structured data into these strings, controlling it through careful choices in their schemas.
PSE and other integration technologies may be providing a higher level of semantic analysis.
These techniques could figure out the meaning of content and “fill in the blanks” when an item of information is ambiguous or missing. The idea is to enrich an information object with additional tags so that queries about lineage (where something came from) and likelihood of accuracy (the “correctness” of an information element) can be used to generate a result.
Another new concept is a probabilistic mediated schema automatically created from the data sources. Semantic mappings between the schemas of the data sources are mediated by schemas with probabilities attached to each - to model uncertainty at its core. A deterministic mediated schema created from the probabilistic ones will be exposed to the user who could use the terminology of this mediated schema to interact with the system.
The Semantic Web is emerging to help us get the most out of the world's information. Many interesting applications are already here. Some of them already acquired by major search players - Bing, for example, is based on semantic technology from Powerset that Microsoft purchased in 2008. This blog article is only about one of the players organizing the world's information. Stay tuned for more.
No comments :
Post a Comment