While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:
- Unclosed Tags:
HTMLtoXML("<p><b>Hello") == '<p><b>Hello</b></p>'
- Empty Elements:
HTMLtoXML("<img src=test.jpg>") == '<img src="test.jpg"/>'
- Block vs. Inline Elements:
HTMLtoXML("<b>Hello <p>John") == '<b>Hello </b><p>John</p>'
- Self-closing Elements:
HTMLtoXML("<p>Hello<p>World") == '<p>Hello</p><p>World</p>'
- Attributes Without Values:
HTMLtoXML("<input disabled>") == '<input disabled="disabled"/>'
Note: It does
not take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.