Introduction to Document Type Definition

From Wiki

Jump to: navigation, search

A Document Type Definition (DTD) defines the legal building blocks of an XML document. It defines the document structure with a list of legal elements and attributes together with their properties.

Contents

Defining XML Elements

First DTD Example (without attributes)

<!ELEMENT Book (Title, Author, ISBN?)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>

First line specify that, in an instance document, the content of element Book must consists of a Title subelement followed by (the order is significant!) an Author element and then an optional(the character ?) ISBN element. The comma , is used as a special character denoting the sequence. Remember also that XML is case sensitive.

The second line specify that the Title element content is a string of type #PCDATA. The next lines specify the same.

The below markup is an example of a valid instance document against the above DTD:

<Book>
 <Title>XML Bible</Title>
 <Author>E.R. Harold</Author>
 <ISBN>87990234</ISBN>
</Book>

How to encode alternative subelements

One may consider that a book has an author or an editor i.e. the content of a Book element should be either an Author element or an Editor element. The below example specify how to define the element Book according with the above constraints. The special character | denotes the alternative (or):

<!ELEMENT Book (Title, (Author|Editor), ISBN?)>

In some cases using alternative elements one can avoid the ordering of sub-elements (limited):

<!ELEMENT Lecturer ((Name,Phone)|(Phone,Name))>

In the above Lecturer element the order of Name and Phone elements is not relevant.

Cardinality Constraints for Elements

What if a book may have many authors and not just one? Such a request should be handled by using cardinality constraints on our elements.

Symbol Meaning
? At most 1
* 0 or more
+ 1 or more
Exactly 1
<!ELEMENT Book (Title, (Author+|Editor), ISBN?)>

Defining Attributes of XML Elements

A DTD Example (with attributes)

Element attributes are specified by using the keyword ATTLIST. The syntax is

<!ATTLIST name_of_the_element attribute_type_constraint>

For example, the blow example shows a DTD where the element SalesOrder has three attributes: no, customerNo and deliveryDate.

The attribute no has the type ID and its required (i.e. cannot miss in any SalesOrder element)

<!ELEMENT SalesOrder (SoLine+)>
<!ATTLIST SalesOrder 
          no ID #REQUIRED 
          customerNo CDATA #REQUIRED
          deliveryDate CDATA #REQUIRED>
<!ELEMENT SoLine EMPTY >
<!ATTLIST SoLine  
          itemNo ID #REQUIRED 
          quantity CDATA #REQUIRED 
          description CDATA #IMPLIED>

A valid XML document against the above DTD is:

<SalesOrder no="so1234" customerNo="234567" deliveryDate="29092008">
 <SoLine 
   itemNo="dvd12345" 
   quantity="1" 
   description="DVD - 1492: Conquest of Paradise (1992)">
</SalesOrder>

Notice that the attribute order in an element is not significant.

Attribute Constraints

Symbol Meaning
#REQUIRED mandatory
#IMPLIED optional
#FIXED "value" constant value

A complete DTD Example

The example below is a DTD for an XML email format:

<!ELEMENT Email (Head,Body)>
<!ELEMENT Head (From,To+,Cc*,Subject)>
<!ELEMENT From EMPTY>
<!ATTLIST From name CDATA #IMPLIED address CDATA #REQUIRED>
<!ELEMENT To EMPTY>
<!ATTLIST To name CDATA #IMPLIED address CDATA #REQUIRED>
<!ELEMENT Cc EMPTY>
<!ATTLIST Cc name CDATA #IMPLIED address CDATA #REQUIRED>
<!ELEMENT Subject (#PCDATA)>
<!ELEMENT Body (Text, Attachment*)>
<!ELEMENT Text (#PCDATA)>
<!ELEMENT Attachment EMPTY>
<!ATTLIST Attachment type (mime|binhex) "mime" #REQUIRED filename CDATA #REQUIRED >

an a corresponding valid XML instance document:

<Email>
 <Head>
  <From name="Adrian Giurca" address="giurca@tu-cottbus.de"/>
  <To name="Mihai Gabroveanu" address="mihaiug@inf.ucv.ro"/>
  <Subject>Where is your chapter?</Subject>
 </Head>
 <Body>
  <Text>
  Where is the thesdis chapter you promised me last week?
  </Text>
 </Body>
</Email>

DTD Datatypes

Symbol Meaning
#PCDATA parsed character data
CDATA character data
ID In the scope of the document unique identifier
IDREF Reference to an identifier (like a foreign key)
( value1 | value2 |...) Enumeration

PCDATA means parsed character data. It is the usual text found between the start tag and the end tag of an XML element. PCDATA is text that will be parsed by a parser. The text will be examined by the parser for entities and markup. Tags inside the text will be treated as markup and entities will be expanded. PCDATA should not contain any &, <, or > characters; these need to be represented by the &amp; &lt; and &gt; entities, respectively.

CDATA (character data)is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded.

A potentially really confusing issue is that there's *another* "CDATA," also referred to as marked sections. A marked section is a portion of element #PCDATA content delimited with special strings: <![CDATA[ to open the section, and ]]> to close it.

Defining an XML entity in a DTD

The below example shows how can we define a DTD entity:

<!ENTITY mfg "Mit freundlichen Gr&#252;&#223;en,">

and how to use it in a XML document:

<text> &mfg; </text>

As a result, the entity substitution will give:

Mit freundlichen Grüßen,

Assigning a DTD to a XML Document

To assign a DTD to an XML document you must use a document type declaration. This instruction is placed immediately after the (optional) XML processing instruction <?xml version="1.0" encoding="UTF-16"?>:

How to assign a local DTD file to an XML Document

<!DOCTYPE Email SYSTEM "email.dtd" >

Notice that the Email is assumed to be the root of the corresponding instance document and email.dtd is the full path of the local file where the DTD definitions are saved.

How to assign a public DTD to an XML Document

If your DTD is publicly available on the Web (i.e. there is an URL where you can retrieve the DTD) then

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

In the above example, html is assumed to be the root element the "-//W3C//DTD XHTML 1.0 Strict//EN" is the public name of the DTD and http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd is the URL where the DTD can be retrieved from the Web.

How to assign a DTD inside an XML Document

<?xml version="1.0" encoding="UTF-8"?>
<!--Sample XML file with embedded DTD definitions-->
<!DOCTYPE Email [
  <!ELEMENT Email (Head,Body)>
 <!ELEMENT Head (From,To+,Cc*,Subject)>
 <!ELEMENT From EMPTY>
 <!ATTLIST From name CDATA #IMPLIED address CDATA #REQUIRED>
 <!ELEMENT To EMPTY>
 <!ATTLIST To name CDATA #IMPLIED address CDATA #REQUIRED>
 <!ELEMENT Cc EMPTY>
 <!ATTLIST Cc name CDATA #IMPLIED address CDATA #REQUIRED>
 <!ELEMENT Subject (#PCDATA)>
 <!ELEMENT Body (Text, Attachment*)>
 <!ELEMENT Text (#PCDATA)>
 <!ELEMENT Attachment EMPTY>
 <!ATTLIST Attachment type (mime|binhex) "mime">
]>
<Email>
 <Head>
  <From name="Adrian Giurca" address="giurca@tu-cottbus.de"/>
  <To name="Mihai Gabroveanu" address="mihaiug@inf.ucv.ro"/>
  <Subject>Where is your chapter?</Subject>
 </Head>
 <Body>
  <Text>
  Where is the thesdis chapter you promised me last week?
  </Text>
 </Body>
</Email>
Personal tools