A better HTML5 parser for PHP

March 31, 2016
A better HTML5 parser for PHP | Ivo Petkov
HTML and PHP have existed for a long time, and one of the main use cases for PHP is to render HTML. Unfortunately, it is not that easy to parse and modify HTML in PHP, especially HTML5. That's why I created a simple library that will help you do that, and more, easily. It extends PHP's native DOMDocument library, so it is very familiar to use. DOMDocument is a very powerful library, but it doesn't work quite well with HTML5. So, meet HTML5DOMDocument - an open source library that extends DOMDocument, fixes some issues and adds some functionality.

The fixes

Preserves white spaces - the DOMDocument library removes some white spaces between text and HTML tags, and sometimes they are important.

Preserves   - the DOMDocument library convertes   to space character (" ").

Preserves void tags - the DOMDocument library converts <source> tag to <source></source>, which is an invalid HTML tag.

The new

Inserting HTML - Sometimes you need to insert dynamically HTML snippet in an HTML code. The most common case is to append it to the bottom. You may also want to insert a whole HTML document. Then the head content will be added in the proper place. You can also specify an insert target for the body content. Here is an example:
$dom = new IvoPetkov\HTML5DOMDocument(); $dom->loadHTML('<!DOCTYPE html><html><body><div>Hello </div></body></html>'); // Find the div element and appends insert target $dom->querySelector('div')->appendChild($dom->createInsertTarget('target1')); // Inserts the HTML snipped into the insert target $dom->insertHTML('<html><body>world</body></html>', 'target1'); echo $dom->saveHTML(); // Output will be: <!DOCTYPE html><html><body><div>Hello world</div></body></html>
Querying the DOM - The method querySelectorAll is very popular in the JavaScript world only because it's very helpful, and now it's available for DOMDocuments in PHP. Here is an example:
$dom = new IvoPetkov\HTML5DOMDocument(); $dom->loadHTML('<!DOCTYPE html><html><body><div>Div 1</div><div>Div 2</div></body></html>'); $divElements = $dom->querySelectorAll('div'); // $divElements will be a DOMNodeList
I hope you'll find this library helpful. Download and contribute at GitHub.