XML简易教程之二-巨人网络通讯

XML简易教程之二

文档格式的排错
我妈妈_的清单中有数十条菜谱，甚至数百条。如果产生一个致
命错误，排错将非常困难 - 你将一行一行地寻找丢失的标记
符。如果使用几层嵌套，发现错误将很困难。

但是可以找到很好的帮助。分析器 - XML代码和报告格式错误
的应用程序可以在网上免费得到。其中最好的是Lark，它的作
者是由Tim Bray - XML规范的技术编辑和极力鼓吹者，地球上最
聪明的人之一。

我用Lark分析下面的代码。注意"chocolate chips"和它的关闭
标记符出现在/ingredients> 标记符中的位置有错误：

?xml version="1.0"?>

list>

recipe>

author>Carol Schmidt/author>

recipe_name>Chocolate Chip Bars/recipe_name>

meal>Dinner

course>Dessert/course>

/meal>

ingredients>

item>2/3 C butter/item>

item>2 C brown sugar/item>

item>1 tsp vanilla/item>

item>1 3/4 C unsifted all-purpose flour/item>

item>1 1/2 tsp baking powder/item>

item>1/2 tsp salt/item>

item>3 eggs/item>

item>1/2 C chopped nuts/item>

item>

/ingredients>2 cups (12-oz pkg.) semi-sweet choc.

chips/item>

directions>

Preheat overn to 350 degrees. Melt butter;

combine with brown sugar and vanilla in large mixing bowl.

Set aside to cool. Combine flour, baking powder, and salt; set aside.

Add eggs to cooled sugar mixture; beat well. Stir in reserved dry

ingredients, nuts, and chips.

Spread in greased 13-by-9-inch pan. Bake for 25 to 30 minutes

until golden brown; cool. Cut into squares.

/directions>

/recipe>

/list>

下面是分析器返回的结果：

Error Report

Line 17, column 22: Encountered /ingredients> expected /item>

... assumed /item>

Line 18, column 36: Encountered /item> with no start-tag.

有了这种信息，找到错误将不会成为问题。那么XML文件的有效性
是指什么呢？

实现有效性
最终我们将在组织良好的XML文档中加入信息。实际上，我们
有很多事要做 - 仍然有危机潜伏 - 虽然XML文件组织良好，
但还可能丢失关键信息。看看下面的例子：

recipe>
author>Carol Schmidt/author>
recipe_name>Chocolate Chip Bars/recipe_name>
meal>Dinner course>Dessert/course> /meal>
ingredients> /ingredients>
directions>Melt butter; combine with, etc. ... /directions>
/recipe>
这份菜谱中没有包含ingredient，而且因为它组织良好，所以
Lark分析器也不会发现问题。管理过哪怕是最和善的数据库的
人都知道我们人类常犯的错误：如果有机会，我们会丢掉关键
信息并加入无用的废话。这就是为什么XML的发明者引入DTD -
文档类型定义（Document Type Definition）。DTD提供了一种保
证XML或多或少是你所想的方法。

让我们看看用在菜谱上的一个DTD。

!DOCTYPE list [
!ELEMENT recipe (recipe_name, author, meal, ingredients, directions)>
!ELEMENT ingredients (item+)>
!ELEMENT meal (#PCDATA, course?)>
!ELEMENT item (#PCDATA, sub_item*)>
!ELEMENT recipe_name (#PCDATA)>
!ELEMENT author (#PCDATA)>
!ELEMENT course (#PCDATA)>
!ELEMENT item (#PCDATA)>
!ELEMENT subitem (#PCDATA)>
!ELEMENT directions (#PCDATA)>
]>
这些代码起初看起来不够友好，但当把它分解时却能看出其中
的意义。让我们详细解释之：

!DOCTYPE list [

这行是说，包含在方括号中的是具有根元素list>的某个文档的
DTD。如我们以前提到的，根元素包含所有其它元素。

!ELEMENT recipe (recipe_name, meal, ingredients, directions)>

这行定义了recipe>标记符。圆括号是说其中的四种标记符必
须按照顺序出现在recipe>标记符中。

!ELEMENT meal (#PCDATA, course?)>

这行需要详细的解释。我定义了以下的结构：

meal>Here the meal name is mandatory
course>One course name may appear, but it is not
mandatory/course>
/meal>
我这样做是因为，按照我的想法，午餐不一定特定某道菜，但
是晚餐可能要指出开胃食品、主菜和餐后甜点。通过指定
#PCDATA - 表示经过分析的字符数据（即非二进制数据）来
实现这个功能。这里，#PCDATA是文本 - 例如，“dinner”。

"course"后面的问号表示0或1对course>标记符将出现在meal>
标记符内。

现在让我们看看下一行：

!ELEMENT ingredients (item+)>

这里的加号表示至少有一对item>标记符应出现在ingredients>
标记符内。

我们感兴趣的最后一行是：

!ELEMENT item (#PCDATA, sub_item*)>

我把sub_item*作为一项安全措施。除了要求每个item的文本之
外，我希望计算每个item的内容的数量。星号是说在item>标记
符中可以有子条目的数目。我不需要Chocolate Chip Bars菜谱的
任何子条目，但是当它的组成成分很复杂时就用得着。

现在让我们把这些放在一起看看我们能得到什么。

DTD的完整例子
下面是一个完整的例子。我把另一个菜谱加入文件内，并为
DTD做了注释。可以注意到我在第二个菜谱中用到子条目。

?xml version="1.0"?>
!--This starts the DTD. The first four lines address document structure-->
!DOCTYPE list ][
!ELEMENT recipe (recipe_name, author, meal, ingredients,directions)>
!ELEMENT ingredients (item+)>
!ELEMENT meal (#PCDATA, course?)>
!ELEMENT item (#PCDATA, sub_item*)>
!--These are the remaining elements of the recipe tag -->
!ELEMENT recipe_name (#PCDATA)>
!ELEMENT author (#PCDATA)>
!ELEMENT directions (#PCDATA)>
!--The remaining element of the meal tag -->
!ELEMENT course (#PCDATA)>
!--The remaining element of the item tag -->
!ELEMENT sub_item (#PCDATA)>
]>
　

?xml version="1.0"?>
list>
recipe>
author>Carol Schmidt/author>
recipe_name>Chocolate Chip Bars/recipe_name>
meal>Dinner
course>Dessert/course>
/meal>
ingredients>
item>2/3 C butter/item>
item>2 C brown sugar/item>
item>1 tsp vanilla/item>
item>1 3/4 C unsifted all-purpose flour/item>
item>1 1/2 tsp baking powder/item>
item>1/2 tsp salt/item>
item>3 eggs/item>
item>1/2 C chopped nuts/item>
item>2 cups (12-oz pkg.) semi-sweetchoc. chips/item>
/ingredients>
directions>
Preheat oven to 350 degrees. Melt butter;
combinewith brown sugar and vanilla in large mixing bowl.
Set aside to cool. Combine flour, baking powder, andsalt;
set aside.Add eggs to cooled sugar mixture; beat well.
Stir in reserved dry ingredients, nuts, and chips.
Spread in greased 13-by-9-inch pan.
Bake for 25 to 30minutes until golden brown; cool.
Cut into squares.
/directions>
/recipe>
recipe>
recipe_name>Pasta with tomato Sauce/recipe_name>
meal>Dinner
course>Entree/course>
/meal>
ingredients>
item>1 lb spaghetti/item>
item>1 16-oz can diced tomatoes/item>
item>4 cloves garlic/item>
item>1 diced onion/item>
item>Italian seasoning
sub_item>oregano/sub_item>
sub_item>basil/sub_item>
sub_item>crushed red pepper/sub_item>
/item>
/ingredients>
directions>
Boil pasta. Sauté garlic and onion.
Add tomatoes.Serve hot.
/directions>
/recipe>
/list>
既然有DTD，文档将被检查看是否符合DTD做出的限制。换句话
说，我们要保证文档的有效性。