Introduction to XML Schema

From Wiki

Jump to: navigation, search

Before you reading this document make sure that you are familiar with:

The purpose of XML Schema is to provide a language to define different classes of XML documents.The term "instance document" is often used to describe an XML document that conforms to a particular schema. Notice that despite DTD's XML Schema is at first an XML document, therefore can be authored with any XML editor.


Lets considering an instance document in a file named email.xml. It describes an email generated by an email client application:

<Email>
 <Head>
  <From name="Adrian Giurca" address="giurca@tu-cottbus.de"/>
  <To address="mihaiug@inf.ucv.ro"/>
  <Subject>Where is your chapter?</Subject>
 </Head>
 <Body>
  <Text>
  Where is the thesis chapter you promised me last week?
  </Text>
 </Body>
</Email>

The email has a root element, Email, and the subelements Head and Body.These subelements (except Text) contain other subelements (such as From, To, Subject and Text), and attributes (such as name) until a subelement such as Text contains a string rather than any subelements.

Contents

Basic rules

  1. Elements that contain subelements or attributes are said to have complex types,
  2. Elements that contain strings (and numbers, and dates, etc.) but do not contain any subelements are said to have simple types.
  3. Attributes always have simple types.

The complex types and some of the simple types, must be defined in the schema for emails. The other simple types are defined as part of XML Schema's built-in datatypes (simple types).

However, notice that, in an usual email the Head may contain also a list of carbon-copy ( potential Cc elements) receiver addresses, a list of black-carbon copy (potential Bcc elements) addresses as well as the Body element may contain attachments (a potential Attachment element providing a mime-type attribute of the attachment). Therefore the schema for emails must model any potential email. Readers familiar with Document Type Definition grammars may take a look of such a DTD tailored to the above example.

Defining the Email (root) element and its type

 <xs:element name="Email" type="Email.Type"/>
 <xs:complexType name="Email.Type">
  <xs:sequence>
   <xs:element name="Head" type="Head.Type"/>
   <xs:element name="Body" type="Body.Type"/>
  </xs:sequence>
 </xs:complexType>

The root element Email consists in a sequence of two subelements namely Head and Body. According with the basic rules (it contains subelements) it must be a complex type.

As we can see in the example, each element definition provides the name (e.g. Email) of the element and its type (e.g. Email.Type).

Usually complex types are defined with a grouping construct:
  • xs:sequence - a sequence of subelements
  • xs:all - a set of subelements (the order doesn't matter)
  • xs:choice - a choice of subelements (the order doesn't matter)


Defining Body.Type

The Body element is set to contain a sequence of one subelement Text and al most 5 Attachment elements.

 <xs:complexType name="Body.Type">
  <xs:sequence>
   <xs:element name="Text" type="xs:string"/>
   <xs:element name="Attachment" type="Attachment.Type" minOccurs="0" maxOccurs="5"/>
  </xs:sequence>
 </xs:complexType>

We can see that the Text subelement has a simple type namely xs:string. This simple type is an XML Schema built-in type.

The Attachment element is defined by using cardinality restrictions i.e by using minOccurs and maxOccurs schema attributes.

The cardinality restrictions for elements supported by XML Schema are:

Meaning XML Schema DTD
At most one minOccurs="0" maxOccurs="1" ?
one or more minOccurs="1" maxOccurs="unbounded" +
zero or more minOccurs="0" maxOccurs="unbounded" *
exactly one

Notice that the default cardinality constraints are minOccurs="1" and maxOccurs="1". Consequently you may not declare them. In addition you should know that neither minOccurs nor maxOccurs, may appear in the declarations of global elements.


More on xs:all and xs:choice:

  • When specifying a set of subelements with the grouping construct xs:all you can only use 0 and 1 for minOccurs and maxOccurs.
  • The xs:all grouping element does not allow further nesting of other grouping elements (e.g. you cannot have a xs:sequence within an xs:all)
  • In a xs:choice grouping element you can set the minOccurs and maxOccurs attributes in order to allow for any number of possible child elements between these two boundaries e.g. <xs:choice minOccurs="0" maxOccurs="3">.


Defining Attachment.Type

Lets assume that the Attachment element must provide a required attribute, type, with two possible values mime or binhex and a required attribute filename, encoding the attached file name. Accordingly, Attachment is an empty element.

<xs:complexType name="Attachment.Type">
 <xs:attribute name="type" type="Mime.Type" use="required"/>
 <xs:attribute name="filename" type="Filename.Type" use="required"/>
</xs:complexType>

As we can see the attribute definitions are dome by using xs:attribute and specifying a name (e.g. filename) and a type (e.g. Filename.Type) for that attribute. In addition, cardinality restrictions for the attribute can be set (e.g. use="required").

The XML Schema cardinality restrictions for attributes are:

Meaning XML Schema DTD
Required use="required" #REQUIRED
Optional use="optional" #IMPLIED
Prohibited use="prohibited" N/A

Defining Mime.Type and Filename.Type

<xs:simpleType name="Mime.Type">
 <xs:restriction base="xs:string">
  <xs:enumeration value="mime"/>
  <xs:enumeration value="binhex"/>
 </xs:restriction>
</xs:simpleType>

<xs:simpleType name="Filename.Type">
 <xs:restriction base="xs:string">
  <xs:pattern value="^([a-zA-Z].(((j|J)(p|P)(g|G))|((g|G)(i|I)(f|F)))"/>
 </xs:restriction>
</xs:simpleType>

The Mime.Type type consists from two possible values, mime and binhex. This can be encoded in XML Schema as an enumeration. An enumeration is a simple type restricted from another simple type by enumerating the possible values.

XML Schema allows regular expressions in the definitions of simple types. In the above design the Filename.Type is a simple type restricted from xs:string by a specific pattern constructed with the help of a regular expression.

Defining Head.Type

The Head element must contain:

  • Exactly one From subelement followed by:
    • a sequence of zero or more To elements,
    • a sequence of zero or more Cc elements,
    • then a sequence of zero or more Bcc elements
    • and, finally, one Subject element.

Therefore its type looks as below:

<xs:complexType name="Head.Type">
<xs:sequence>
 <xs:element name="From" type="Address.Type"/>
 <xs:sequence minOccurs="0" maxOccurs="unbounded">
  <xs:element name="To" type="Address.Type"/>
 </xs:sequence>
 <xs:sequence minOccurs="0" maxOccurs="unbounded">
  <xs:element name="Cc" type="Address.Type"/>
 </xs:sequence>
 <xs:sequence minOccurs="0" maxOccurs="unbounded">
  <xs:element name="Bcc" type="Address.Type"/>
 </xs:sequence>
 <xs:element name="Subject" type="xs:string"/>
</xs:sequence>
</xs:complexType>

You may notice that cardinality restrictions may apply also on xs:sequence element of XML Schema.

Defining Address.Type

As from the instance document example, the From, To, Cc and Bcc are similar therefore they are assigned to the same type, Address.Type. Each of these elements must provide am optional name attribute and a required address attribute. The value of the address attribute must be a valid email address.

<xs:complexType name="Address.Type">
<xs:attribute name="name" type="xs:string" use="optional"/>
<xs:attribute name="address" use="required">
 <xs:simpleType>
  <xs:restriction base="xs:string">
   <xs:pattern value="[\w-]+@([\w-]+\.)+[\w-]+"/>
  </xs:restriction>
 </xs:simpleType>
</xs:attribute>
</xs:complexType>

In the above type definition I used just a basic regular expression for email validation. Much more complex expressions can be written.

The Complete Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
 <xs:element name="Email" type="Email.Type"/>
 <xs:complexType name="Email.Type">
  <xs:sequence>
   <xs:element name="Head" type="Head.Type"/>
   <xs:element name="Body" type="Body.Type"/>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="Body.Type">
  <xs:sequence>
   <xs:element name="Text" type="xs:string"/>
   <xs:element name="Attachment" type="Attachment.Type" minOccurs="0" maxOccurs="5"/>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="Attachment.Type">
  <xs:attribute name="type" type="Mime.Type" use="required"/>
  <xs:attribute name="filename" type="Filename.Type" use="required"/>
 </xs:complexType>
 <xs:simpleType name="Mime.Type">
  <xs:restriction base="xs:string">
   <xs:enumeration value="mime"/>
   <xs:enumeration value="binhex"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:simpleType name="Filename.Type">
  <xs:restriction base="xs:string">
   <xs:pattern value="^([a-zA-Z].(((j|J)(p|P)(g|G))|((g|G)(i|I)(f|F)))"/>
  </xs:restriction>
 </xs:simpleType>
 <xs:complexType name="Head.Type">
  <xs:sequence>
   <xs:element name="From" type="Address.Type"/>
   <xs:sequence minOccurs="0" maxOccurs="unbounded">
    <xs:element name="To" type="Address.Type"/>
   </xs:sequence>
   <xs:sequence minOccurs="0" maxOccurs="unbounded">
    <xs:element name="Cc" type="Address.Type"/>
   </xs:sequence>
   <xs:sequence minOccurs="0" maxOccurs="unbounded">
    <xs:element name="Bcc" type="Address.Type"/>
   </xs:sequence>
   <xs:element name="Subject" type="xs:string"/>
  </xs:sequence>
 </xs:complexType>
 <xs:complexType name="Address.Type">
  <xs:attribute name="name" type="xs:string" use="optional"/>
  <xs:attribute name="address" use="required">
   <xs:simpleType>
    <xs:restriction base="xs:string">
     <xs:pattern value="[\w-]+@([\w-]+\.)+[\w-]+"/>
    </xs:restriction>
   </xs:simpleType>
  </xs:attribute>
 </xs:complexType>
</xs:schema>


Advanced Readings


References


Authoring Tools for XML Schema

Check our XML Tools Category.
Personal tools