Thursday, April 12, 2012

Remove excessive whitespace in user input field

In my controller method for handling a (potentially hostile) user input field I have the following code:



string tmptext = comment.Replace(System.Environment.NewLine, "{break was here}"); //marks line breaks for later re-insertion
tmptext = Encoder.HtmlEncode(tmptext);
//other sanitizing goes in here
tmptext = tmptext.Replace("{break was here}", "<br />");

var regex = new Regex("(<br /><br />)\\1+");
tmptext = regex.Replace(tmptext, "$1");


My goal is to preserve line breaks for typical non-malicious use and display user input in safe, htmlencoded strings. I take the user input, parse it for newline characters and place a delimiter at the line breaks. I perform the HTML encoding and reinsert the breaks. (i will likely change this to reinserting paragraphs as p tags instead of br, but for now i'm using br)



Now actually inserting real html breaks opens me up to a subtle vulnerability: the enter key. The regex.replace code is there to strip out a malicious user just standing on the enter key and filling the page with crap.



This is a fix for big crap floods of just white but still leaves me open to abuse like entering one character, two line breaks, one character, two line breaks all down the page.



My question is for a method of determining that this is abusive and failing it on validation. I'm scared that there might not be a simple procedural method to do it and instead will need heuristic techniques or bayesian filters. Hopefully, someone has an easier, better way.



PS: I can do the described abuse in the editor window here, it appears in the preview box at least, I'm not going to check if it will make it on to the site.





No comments:

Post a Comment