Semantic Web
The vast amounts of computerized data contained on the World Wide Web would appear to be the largest body of information ever assembled. Certainly, the Web is a uniquely valuable tool for both research and the dissemination of ideas and knowledge. But the fact remains that the Web has been remarkably resistant to direct, effective, efficient use by computers.
Tim Berners-Lee——the Oxford University graduate who invented the Web in 1989, wrote the first Web browser and server in 1990 and currently directs the World Wide Web Consortium ——has a much grander vision for the Web of the future, which he calls the Semantic Web. The Semantic Web adds a metadata infrastructure of tags to define elements of information within Web pages, linking them so computers can extract meaning from widely separated data as easily as the Internet currently links individual documents. The Semantic Web will make it possible for machines, as well as people, to find, read, understand and use data over the Web to accomplish useful tasks. The Semantic Web will extend, not replace, the Web as we know it today.
In some instances, we already use specialized software to work with carefully identified Web data, but this is the exception, not the rule. It takes people to surf the Web, shop online, make sense of search-engine results and decide which additional links to follow. The Semantic Web, once it becomes a functioning reality, will let a user launch an agent or process that will then proceed on its own, perhaps checking back with the user periodically as the work progresses.
The Internet was originally created as a way for researchers to easily exchange computer data with one another. Although data traveled across the Internet in the form of bits and bytes, the basic unit of meaning, as far as the computer systems were concerned, was the file.
That changed when the Web came into being. Berners-Lee built his Web around pages, which are documents written in HTML. A versatile language, HTML combines interactive forms, text and multimedia objects ——such as images and sound——and it describes how these elements should be presented and what the overall page should look like. Unfortunately, HTML has a very limited ability to classify the blocks of text on a page, apart from the roles they play in a typical document's organization and in the chosen graphical layout.
As Web use grew, HTML's limitations led to the development of XML and XHTML, which began to offer mechanisms for adding meaning to Web pages. The Simple Object Access Protocol and Web services became a reality, making it easier for users and even automated processes to gather specific information or perform specialized functions across the Web. When the Semantic Web comes to fruition, software will be able to locate information within Web pages, thus breaking through the document level and accessing real data that it can use directly. In one sense, the Semantic Web will become a kind of global database.
Making Machines Smarter
In an age when grandmothers and kindergartners use computers and surf the Web, it’s sometimes hard to recall just how much direction or guidance a user has to give a computer to accomplish anything. Machines can’t use partial information, they don’t know what’s inside an image or graphic, they’re not much good at making analogies or combining information from different sources, and they don’t have a big vocabulary.
We can easily use the Web to look up a Computerworld article or blog, buy a book, locate an eye doctor near our workplace or put out a question on a chat forum or bulletin board. But ask your computer to do the same thing, and it won’t know where to start unless you give it a detailed, correctly spelled series of commands and responses in the proper sequence.
For example, using HTML and a Web browser, one can create and present a catalog page of items for sale. But HTML has no inherent capability to know that, say, Item No. JG1896 is an Acme widget with a retail price of $9.95. All HTML can specify is that the text “JG1896” should be positioned near the text “Acme widget” and“$$”. HTML has no way to express or know that “Acme widget” is a kind of consumer product, that “$$” is a price, or that these pieces of information describe an item that is distinct from other items listed on the same page.
The Semantic Web will address that by enabling computers and software to find, read, understand and use information contained inside Web documents to accomplish useful tasks via automated agents and Web-based services.
语义Web
万维网上拥有的计算机化的巨量数据看来是聚集在一起的最大信息体。毫无疑问,Web对研究和传播知识和点子是一个非常独特的有价值的工具。但是,事实却是万维网一直强烈抵制计算机直接和高效地使用这些信息。
Tim Berners-Lee,这位牛津大学的毕业生在1989年发明了万维网,1990年编写了第一个Web浏览器和服务器,目前正领导着万维网联盟,对万维网的未来有一个更加宏大的愿景,他将其称之为语义Web。语义Web增加了标记的元数据基础结构,来定义网页中的信息元素、链接它们,从而使计算机能像目前因特网链接各个分开的文档那样容易地从隔得很远的数据中提取意义。语义Web将使机器和人都能够在Web网上发现、阅读、理解和使用数据,以完成有意义的任务。语义Web将扩展、而不是替代目前我们所知的Web网。
在有些情况下,我们已经在使用专用的软件,对经过仔细识别的网上数据进行加工,但这只是例外,不是惯例。它使人们实现网上冲浪、在线采购、利用搜索引擎给出的结果以及确定追踪哪一个附加的链接。语义Web一旦成为了现实,就能让用户发起一个代理或进程,然后代理或进程就自行进行下去,在工作进展中也许还能定期地回过头来与用户核对一下。
当初建立因特网就是将它作为一种方法,让研究人员相互之间很容易地交换计算机数据。虽然数据是以位和字节的形式在因特网上传输,但就计算机系统而言,有意义的基本单位是文件。
当有了Web后,情况就变了。Berners-Lee围绕页面(它是用HTML写的文档)建立Web。HTML这个万能的语言将交互格式、文本和多媒体对象(如图像和声音)结合起来,并描述这些元素应如何表示以及整个页面应是什么样的。可惜,HTML在对页面上的文本块分类的能力非常有限,更不用提它们在典型的文档组织中和选择图形设计中所起的作用。
随着Web应用的增长,HTML的局限性导致了XML和XHTML的开发,这两者开始提供一种机制,给Web页面增加意义。简单对象访问协议和Web服务成为了现实,使用户甚至自动的进程收集特定的信息或者在Web上执行专门的功能更加容易。当语义Web实现之时,软件就能在Web页面中确定信息,从而突破文档一级的限制,访问可以直接使用的真正数据。在某种意义上讲,语义Web将成为一种全球性的数据库。
使机器更聪明
当老人和小孩使用计算机在Web上冲浪时,有时很难会想起作为用户应该给计算机多少命令或指示以做完某事。机器也不会使用部分信息,它们不知道图像或图形中的是什么东西,不善于进行类比或将不同来源的信息结合起来,它们的词汇量也不大。
我们很容易利用Web查找诸如《计算机世界》的文章或博客网站,购买书籍,查找工作地点附近的眼科医生或者在聊天论坛或公告板上提出问题。但是要让你的计算机来做这些同样的事情,它就不知道如何下手,除非你给它详细的、拼写正确的一系列指令和适当顺序的响应。
例如,利用HTML和Web浏览器,你就能编造和展示想甩卖物品的目录页。但是HTML没有固有的能力来了解,比方说JG1896号物品是英制的小器具,其零售价为$9.95。HTML所能指定的全部就是文本“JG1896”应该放在文本“英制”和“$$”的旁边。HTML没有办法表达或知道“英制小器具”是一种消费产品,“$$”是价格,或者这些信息描述的是一个与同一页中的其他物品不一样的东西。
语义Web通过使计算机和软件能够寻找、阅读、理解和使用Web文档里面所包含的信息,以便通过自动代理和基于Web的服务完成有用的任务。