Content format requirements and limitations
Some of the more advanced WAF features, such as URL encryption, modify the content (i.e. HTML documents, style sheets, and JavaScript) generated by the web applications they protect. This transformation process requires the content to be reasonably well-formed. In this chapter we will explain what this entails, along with a description of the transformation process.
Requirements for HTML content
HTML content should ideally conform to HTML 4.01 Specification. However, strict compliance to the HTML specification is not required, and in fact rarely encountered in real-life web sites. The HTML parser scans the response for HTML tags. It only considers tags that may have to be rewritten, depending on what WAF features are configured. The table below lists the tags currently considered by the parser, and what steps are taken.
Tag name | Action |
---|---|
A, AREA, APPLET, IMG, BLOCKQUOTE, LABEL, LINK, BODY, DEL, INS, FRAME, FRAMESET, IFRAME, BASE | Rewrite the attribute that contains the URL (a hyperlink, a reference to a background image, a style sheet, a reference to an applet, etc.) |
FORM, INPUT, SELECT, TEXTAREA, BUTTON | Rewrite the form action URL. Form field names and values are parsed in case form signing or form encryption is used. |
SCRIPT | Rewrite the SRC attribute in case of an external JavaScript. Otherwise perform line-based regex rewriting of the tag content. |
STYLE | Perform line-based regex rewriting of the tag content. |
META | Rewrite the refresh location in case of a refresh META tag |
If an error is encountered while parsing a tag, the result depends on the parsing mode. If it is set to tolerant, the rest of the HTML document will be passed as-is to the client. If it is set to strict, parsing will abort. Common errors include bad quoting, and illegal characters inside a tag, as shown below.
<a alt="a nice view of "Mount Matterhorn"" src="mountain.jpg">img</a>
<a alt="i forgot to close this tag" src="link.html" link</a>
Requirements for JavaScript
JavaScript located in external files (content-type text/javascript) and inline (embedded in HTML documents inside SCRIPT tags) is rewritten line by line, with regular expressions identifying patterns that contain URLs.
nevisProxy does not interpret the JavaScript it encounters, thus it cannot handle URLs that are dynamically generated, as in the example below:
<script language="JavaScript">
var prefix = "/my/webapp";
var path = "/file.html";
openInNewWindow(prefix + path);
</script>
A similar limitation exists for client-side DOM manipulations. If a tag is inserted into the DOM tree by JavaScript, and it contains links, those links will not be subjected to the rewriting process described in the previous chapter.
Requirements for Style Sheets
Cascading Style Sheets (CSS) data in external files (content-type text/css) and inline (embedded in HTML documents inside STYLE tags) is rewritten in the same way as JavaScript. Thus the same limitations apply for this type of content.