Introduction to XML Schema
From Wiki
Before you reading this document make sure that you are familiar with:
The purpose of XML Schema is to provide a language to define different classes of XML documents.The term "instance document" is often used to describe an XML document that conforms to a particular schema. Notice that despite DTD's XML Schema is at first an XML document, therefore can be authored with any XML editor.
Lets considering an instance document in a file named email.xml. It describes an email generated by an email client application:
<Email> <Head> <From name="Adrian Giurca" address="giurca@tu-cottbus.de"/> <To address="mihaiug@inf.ucv.ro"/> <Subject>Where is your chapter?</Subject> </Head> <Body> <Text> Where is the thesis chapter you promised me last week? </Text> </Body> </Email>
The email has a root element, Email, and the subelements Head and Body.These subelements (except Text) contain other subelements (such as From, To, Subject and Text), and attributes (such as name) until a subelement such as Text contains a string rather than any subelements.
Contents |
Basic rules
|
The complex types and some of the simple types, must be defined in the schema for emails. The other simple types are defined as part of XML Schema's built-in datatypes (simple types).
However, notice that, in an usual email the Head may contain also a list of carbon-copy ( potential Cc elements) receiver addresses, a list of black-carbon copy (potential Bcc elements) addresses as well as the Body element may contain attachments (a potential Attachment element providing a mime-type attribute of the attachment). Therefore the schema for emails must model any potential email. Readers familiar with Document Type Definition grammars may take a look of such a DTD tailored to the above example.
Defining the Email (root) element and its type
<xs:element name="Email" type="Email.Type"/> <xs:complexType name="Email.Type"> <xs:sequence> <xs:element name="Head" type="Head.Type"/> <xs:element name="Body" type="Body.Type"/> </xs:sequence> </xs:complexType>
The root element Email consists in a sequence of two subelements namely Head and Body. According with the basic rules (it contains subelements) it must be a complex type.
As we can see in the example, each element definition provides the name (e.g. Email) of the element and its type (e.g. Email.Type).
Usually complex types are defined with a grouping construct:
|
Defining Body.Type
The Body element is set to contain a sequence of one subelement Text and al most 5 Attachment elements.
<xs:complexType name="Body.Type"> <xs:sequence> <xs:element name="Text" type="xs:string"/> <xs:element name="Attachment" type="Attachment.Type" minOccurs="0" maxOccurs="5"/> </xs:sequence> </xs:complexType>
We can see that the Text subelement has a simple type namely xs:string. This simple type is an XML Schema built-in type.
The Attachment element is defined by using cardinality restrictions i.e by using minOccurs and maxOccurs schema attributes.
The cardinality restrictions for elements supported by XML Schema are:
| Meaning | XML Schema | DTD |
| At most one | minOccurs="0" maxOccurs="1"
| ? |
| one or more | minOccurs="1" maxOccurs="unbounded"
| + |
| zero or more | minOccurs="0" maxOccurs="unbounded"
| * |
| exactly one |
Notice that the default cardinality constraints are minOccurs="1" and maxOccurs="1". Consequently you may not declare them. In addition you should know that neither minOccurs nor maxOccurs, may appear in the declarations of global elements.
More on xs:all and xs:choice:
- When specifying a set of subelements with the grouping construct
xs:allyou can only use0and1forminOccursandmaxOccurs. - The
xs:allgrouping element does not allow further nesting of other grouping elements (e.g. you cannot have axs:sequencewithin anxs:all) - In a
xs:choicegrouping element you can set theminOccursandmaxOccursattributes in order to allow for any number of possible child elements between these two boundaries e.g.<xs:choice minOccurs="0" maxOccurs="3">.
Defining Attachment.Type
Lets assume that the Attachment element must provide a required attribute, type, with two possible values mime or binhex and a required attribute filename, encoding the attached file name.
Accordingly, Attachment is an empty element.
<xs:complexType name="Attachment.Type"> <xs:attribute name="type" type="Mime.Type" use="required"/> <xs:attribute name="filename" type="Filename.Type" use="required"/> </xs:complexType>
As we can see the attribute definitions are dome by using xs:attribute and specifying a name (e.g. filename) and a type (e.g. Filename.Type) for that attribute. In addition, cardinality restrictions for the attribute can be set (e.g. use="required").
The XML Schema cardinality restrictions for attributes are:
| Meaning | XML Schema | DTD |
| Required | use="required"
| #REQUIRED |
| Optional | use="optional"
| #IMPLIED |
| Prohibited | use="prohibited"
| N/A |
Defining Mime.Type and Filename.Type
<xs:simpleType name="Mime.Type"> <xs:restriction base="xs:string"> <xs:enumeration value="mime"/> <xs:enumeration value="binhex"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="Filename.Type"> <xs:restriction base="xs:string"> <xs:pattern value="^([a-zA-Z].(((j|J)(p|P)(g|G))|((g|G)(i|I)(f|F)))"/> </xs:restriction> </xs:simpleType>
The Mime.Type type consists from two possible values, mime and binhex. This can be encoded in XML Schema as an enumeration. An enumeration is a simple type restricted from another simple type by enumerating the possible values.
XML Schema allows regular expressions in the definitions of simple types. In the above design the Filename.Type is a simple type restricted from xs:string by a specific pattern constructed with the help of a regular expression.
Defining Head.Type
The Head element must contain:
- Exactly one
Fromsubelement followed by:- a sequence of zero or more
Toelements, - a sequence of zero or more
Ccelements, - then a sequence of zero or more
Bccelements - and, finally, one
Subjectelement.
- a sequence of zero or more
Therefore its type looks as below:
<xs:complexType name="Head.Type"> <xs:sequence> <xs:element name="From" type="Address.Type"/> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="To" type="Address.Type"/> </xs:sequence> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="Cc" type="Address.Type"/> </xs:sequence> <xs:sequence minOccurs="0" maxOccurs="unbounded"> <xs:element name="Bcc" type="Address.Type"/> </xs:sequence> <xs:element name="Subject" type="xs:string"/> </xs:sequence> </xs:complexType>
You may notice that cardinality restrictions may apply also on xs:sequence element of XML Schema.
Defining Address.Type
As from the instance document example, the From, To, Cc and Bcc are similar therefore they are assigned to the same type, Address.Type. Each of these elements must provide am optional name attribute and a required address attribute. The value of the address attribute must be a valid email address.
<xs:complexType name="Address.Type"> <xs:attribute name="name" type="xs:string" use="optional"/> <xs:attribute name="address" use="required"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[\w-]+@([\w-]+\.)+[\w-]+"/> </xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType>
In the above type definition I used just a basic regular expression for email validation. Much more complex expressions can be written.
The Complete Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="Email" type="Email.Type"/>
<xs:complexType name="Email.Type">
<xs:sequence>
<xs:element name="Head" type="Head.Type"/>
<xs:element name="Body" type="Body.Type"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Body.Type">
<xs:sequence>
<xs:element name="Text" type="xs:string"/>
<xs:element name="Attachment" type="Attachment.Type" minOccurs="0" maxOccurs="5"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Attachment.Type">
<xs:attribute name="type" type="Mime.Type" use="required"/>
<xs:attribute name="filename" type="Filename.Type" use="required"/>
</xs:complexType>
<xs:simpleType name="Mime.Type">
<xs:restriction base="xs:string">
<xs:enumeration value="mime"/>
<xs:enumeration value="binhex"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="Filename.Type">
<xs:restriction base="xs:string">
<xs:pattern value="^([a-zA-Z].(((j|J)(p|P)(g|G))|((g|G)(i|I)(f|F)))"/>
</xs:restriction>
</xs:simpleType>
<xs:complexType name="Head.Type">
<xs:sequence>
<xs:element name="From" type="Address.Type"/>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="To" type="Address.Type"/>
</xs:sequence>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="Cc" type="Address.Type"/>
</xs:sequence>
<xs:sequence minOccurs="0" maxOccurs="unbounded">
<xs:element name="Bcc" type="Address.Type"/>
</xs:sequence>
<xs:element name="Subject" type="xs:string"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="Address.Type">
<xs:attribute name="name" type="xs:string" use="optional"/>
<xs:attribute name="address" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:pattern value="[\w-]+@([\w-]+\.)+[\w-]+"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
</xs:schema>
Advanced Readings
References
- Elliote Rusty Harold, XML 1.1 Bible, 3rd Edition, Wiley, 2004.
- David C. Fallside, Priscilla Walmsley, XML Schema Primer, Second Edition, W3C
- Paul V. Biron, Ashok Malhotra, XML Schema Part 2: Datatypes, Second Edition, W3C

