Validating XML against XSD in PHP
It’s 2017 and you still prefer using XML(Extensible Markup Language) to JSON(JavaScript Object Notation) for data-interchange? Well, that is not the point about this article. Everything you want to say about the trade-offs of either of them has been dealt with here. But there are still a lot of systems using XML today and I can assure you that it would still be the case X years from now.
Validating XML against XSD might be the first step to take especially when building a feed Reader/Ingester. For Starters, any file like the sample below is a well-formed XML file.
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<buildtime>2002-05-30T09:30:10.5</buildtime>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</catalog>
and below is a sample XSD (XML Schema Definition) file:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="catalog">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:dateTime" name="buildtime"/>
<xs:element name="book" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="author"/>
<xs:element type="xs:string" name="title"/>
<xs:element type="xs:string" name="genre"/>
<xs:element type="xs:float" name="price"/>
<xs:element type="xs:date" name="publish_date"/>
<xs:element type="xs:string" name="description"/>
</xs:sequence>
<xs:attribute type="xs:string" name="id" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
Writing a XSD for your XML is actually easy to do, you make use of tools like freeformatter or do a crash course on w3Schools.
Now, we are ready to validate our XML file against the XSD using either DOMDocument or XMLReader. First, make sure that these extensions are enabled on PHP installation.
Validating With DOMDocument
<?php
class DOMValidator
{
/**
* @var string
*/
protected $feedSchema = __DIR__ . '/sample.xsd';
/**
* @var int
*/
public $feedErrors = 0;
/**
* Formatted libxml Error details
*
* @var array
*/
public $errorDetails;
/**
* Validation Class constructor Instantiating DOMDocument
*
* @param \DOMDocument $handler [description]
*/
public function __construct()
{
$this->handler = new \DOMDocument('1.0', 'utf-8');
}
/**
* @param \libXMLError object $error
*
* @return string
*/
private function libxmlDisplayError($error)
{
$errorString = "Error $error->code in $error->file (Line:{$error->line}):";
$errorString .= trim($error->message);
return $errorString;
}
/**
* @return array
*/
private function libxmlDisplayErrors()
{
$errors = libxml_get_errors();
$result = [];
foreach ($errors as $error) {
$result[] = $this->libxmlDisplayError($error);
}
libxml_clear_errors();
return $result;
}
/**
* Validate Incoming Feeds against Listing Schema
*
* @param resource $feeds
*
* @return bool
*
* @throws \Exception
*/
public function validateFeeds($feeds)
{
if (!class_exists('DOMDocument')) {
throw new \DOMException("'DOMDocument' class not found!");
return false;
}
if (!file_exists($this->feedSchema)) {
throw new \Exception('Schema is Missing, Please add schema to feedSchema property');
return false;
}
libxml_use_internal_errors(true);
if (!($fp = fopen($feeds, "r"))) {
die("could not open XML input");
}
$contents = fread($fp, filesize($feeds));
fclose($fp);
$this->handler->loadXML($contents, LIBXML_NOBLANKS);
if (!$this->handler->schemaValidate($this->feedSchema)) {
$this->errorDetails = $this->libxmlDisplayErrors();
$this->feedErrors = 1;
} else {
//The file is valid
return true;
}
}
/**
* Display Error if Resource is not validated
*
* @return array
*/
public function displayErrors()
{
return $this->errorDetails;
}
}
This DomValidator can be easily used like so:
<?php
$validator = new DomValidator;
$validated = $validator->validateFeeds('sample.xml');
if ($validated) {
echo "Feed successfully validated";
} else {
print_r($validator->displayErrors());
}
The above piece of code is actually easy to understand, the most important method here is the validateFeeds() method.
Validating With XMLReader
The upside of using XMLReader over DomDocument is scalability. XMLReader can handle very large files better than DomDocument. Our class will be very similar to that of DomDocument. Also note that your libxml version is above 2.6.
<?php
class XmlValidator
{
/**
* @var string
*/
protected $feedSchema = __DIR__ . '/sample.xsd';
/**
* @var int
*/
public $feedErrors = 0;
/**
* Formatted libxml Error details
*
* @var array
*/
public $errorDetails;
/**
* Validation Class constructor Instantiating DOMDocument
*
* @param \DOMDocument $handler [description]
*/
public function __construct()
{
$this->handler = new \XMLReader();
}
/**
* @param \libXMLError object $error
*
* @return string
*/
private function libxmlDisplayError($error)
{
$errorString = "Error $error->code in $error->file (Line:{$error->line}):";
$errorString .= trim($error->message);
return $errorString;
}
/**
* @return array
*/
private function libxmlDisplayErrors()
{
$errors = libxml_get_errors();
$result = [];
foreach ($errors as $error) {
$result[] = $this->libxmlDisplayError($error);
}
libxml_clear_errors();
return $result;
}
/**
* Validate Incoming Feeds against Listing Schema
*
* @param resource $feeds
*
* @return bool
*
* @throws \Exception
*/
public function validateFeeds($feeds)
{
if (!class_exists('XMLReader')) {
throw new \DOMException("'XMLReader' class not found!");
return false;
}
if (!file_exists($this->feedSchema)) {
throw new \Exception('Schema is Missing, Please add schema to feedSchema property');
return false;
}
$this->handler->open($feeds);
$this->handler->setSchema($this->feedSchema);
libxml_use_internal_errors(true);
while($this->handler->read()) {
if (!$this->handler->isValid()) {
$this->errorDetails = $this->libxmlDisplayErrors();
$this->feedErrors = 1;
} else {
return true;
}
};
}
/**
* Display Error if Resource is not validated
*
* @return array
*/
public function displayErrors()
{
return $this->errorDetails;
}
}
We use this class too similar to how we used the DomValidator class.
<?php
$validator = new XmlValidator;
$validated = $validator->validateFeeds('sample.xml');
if ($validated) {
echo "Feed successfully validated";
} else {
print_r($validator->displayErrors());
}
So that’s it. I hope to follow this up with ingesting feeds very soon. You can reach me on surajudeen.akande@andela.com for feedback, I will appreciate it.
If anyone is having trouble manipulating the XML file after validating, change the while loop in your validateFeeds method to this: