Parse HTML with ActionScript’s EX4

ActionScript includes an EX4 implementation that allows easy parsing of XML in code. I wanted to parse HTML for screen scraping purposes and found that EX4 allows for a syntax that loosely resembles a combination of XPATH and css3 selectors.

Josh’s tutorial provided a very good foundation; however, I was finding that I was having problems doing attribute-based filtering in the case where the attribute name is the same as an ActionScript keyword. This was especially problematic as I needed to filter based on the contents of the HTML element’s class attribute.

After much investigation I found that while the HTML tag names were in the default namespace, the attribute names were not. As a result, If i set a default namespace for the XML parsing, I could not access the attribute values if I listed their names as strings. However, I found if I created a new QName object with a blank namespace, the attributes would be returned as expected.

default xml namespace = new Namespace("http://www.w3.org/1999/xhtml");
var xml:XML = XML(htmlString);
var results:XMLList = xml..span.(attribute(new QName(new Namespace(""), "class")) == "foo")

The above will return a list of all span objects with a class attribute of “foo”