Filter Input-Escape Output: Security Principle and Practice

I haven’t been blogging lately because I’ve been busy. Chris said that he read my last blog entry in two minutes and thought, “What the fuck happened to Terry?” Well since he’s doing the PHP Advent Calendar, I thought I’d kill two birds with one stone.

[My PHP Advent Calendar submission after the jump]

Who is this guy? (modified)
Name
Terry Chay
Blog
terrychay.com
Biography
When Zend puts your photo on a deck of cards, you’ve either arrived in the PHP world or are a terrorist. Terry Chay is a PHP Terrorist. Being the software architect of Tagged pays the bills. When he isn’t saying politically incorrect things about web development, he is in yr Web 2.0 event, eating your lunch, taking yr photos, and fighting off yr Ruby developers with his mad ninja coding skillz.

He also likes to “draw the line at yellow.”

Location
San Francisco, California, United States

One of the strangest things about living in the Bay Area is the total lack of PHP support groups. We probably have the largest density of PHP coders (skilled and unskilled) in the world, and yet finding 50 people willing to go in on an elePHPants shipment turns you into a rock star out here.

So when there was a SF PHP Meetup, I had to go right? The topic this month was about security. In light of that, I thought I’d use the my Advent entry to show that I’m not just a front-end Ajax developer guy or a PHP design patterns guy. I can also roll with the big boys and talk about Web Security.

Terry and Security—the real oxymoron

Those who heard my latest talk, “The Internet is an Ogre” would find my topic a bit ironic. After all, I was the one who said:

“Web security is a luxury, not a necessity. For any of you think good coding, design aesthetic, or web security are important, I have only one word for you: MySpace.”

If there is one thing I learned from blogging, it is that if you say outrageous over-the-top things like this, you might get a few people to listen and maybe someone to actually believe you. And, if the claim has the added bonus of actually being possibly true… that’s just (Christmas) gravy.

(Even if you’re full of shit, most people will just say you’re “perceptive”—nobody has the balls to call you out on it except those -5 on Slashdot, and everyone knows they’re like the stopped clock of the internet commenting world: only right twice a day. And besides, they send you some trackback love trying to shoot you down.)

More importantly, conferences are not fun unless you can get a rise out of Chris Shiflett, Ed Finkler, and Ilia Alshanetsky at the same time.

The truth is that the security “track” of questions been one of my favorite interview question sets I ask. What I’m going to cover is what I started ranting at the San Francisco PHP Meetup—this track of questions, how I’d answer them, and how this applies to my understanding of PHP and Web Security.

(Be warned: No candidate has ever answered all these questions correctly. One of my coworkers once sat in one of my interviews and afterward told me, “When you asked those security questions, I thought [the candidate] was actually going to cry.” I had (very many) headhunters complain about my interview questions to my boss.)

Intro: Practice comes from principles

A lot of you [at the PHP Meetup] are asking, “What book should I buy to learn about PHP Security?” Well I recommend Chris Shiflett’s book (website). (If you are too poor to buy this book, just go to the old PHP Security Guide instead.)

Why? Because they are impossibly small.

Web security is both really simple and an infinite mass of shit. If you start with the ad hoc approach, it will seem to only be the latter; but, if you take to the time to learn the building blocks which form the language of security principles, then it start to all make sense and become the former.

By virtue of being small, these guides must focus on the vocabulary and principles, without drowning you in detail. I want you to take the time to learn those things. If you don’t have the vocabulary then you can’t do web security.

To understand why, let’s take these questions I ask interview candidates:

Question: What is a SQL injection attack? Give me a simple example of how you do it.

SQL injection is when user input causes the execution of unwanted SQL on the database.

Most candidates get that part, but the second part trips about half of all candidates. I think this is embarassing because Randall Munroe would have no problem with it:

XKCD cartoon

The basic points I’m looking for are:

  • Have they ever thought like an attacker? If you can’t think like an attacker, you can’t think like a defender; if you can’t think up exploits, you can’t defend against exploits.
  • Did their answer have a basic escape sequence?
  • Did their answer inject data or a command?

Bonus points if they can explain why the xkcd exploit specifically doesn’t work in mysql on PHP.

Question: Name three ways to protect against SQL injection? For each way explain where you’d place it in your code?

The key to answering this is to understand that the nature of the attack centers around the quote mark. So the solutions are to either remove the quote mark, escape the quote mark, or use a built-in feature to protect against the quote mark.

That is

  1. You can filter out the quote mark. You would do this on input.
  2. You can escape the quote mark using something like mysql_real_escape_string(). You would do that just before outputing (querying) the database.
  3. You can use a prepared query on the database, if your database and PHP extension supports it (some didn’t until recently).

Almost every candidate can give at least one of the above—since we use Oracle, my DB devs interviewing before me usually hit the candidate with questions about prepared queries. A decent number get all three with a little guidance!

Bonus points if they mention mysql_escape_string() and more if mysql_real_escape_string() and more if they explain the difference between these and escape(). (This has never occurred so I’m not too sure how I’d feel if someone pointed this out, probably humiliated.)

Both the database and the extension must support prepared queries, otherwise the “prepared query” is just an abstracted version of escaping. (This is not the same. What if your database abstraction emulates a prepared query using escape(). Do you know the answer to that right this second? See why I hate database abstraction?)

Almost no one knows where to correctly place this code in the application server. That’s because they think like hackers, not architects. I’m hoping the new filter extension changes this. I’ve had a number of people argue with me about the correct placement of the filtering on input or the correct placement of the escaping on output. Many were competent Perl coders. By the end of this entry I hope you’ll understand why I’m right and they’re wrong (and why Perl coders self-select themselves to mess this question up). ;-)

Question: Which protection against SQL injection is the right approach?

This is a trick question. My answer is that I’d do filter on input and I’d use prepared queries if possible or escaping (not both, obviously). What I’m looking for is that there is no single best way (though clearly, I’ll give them some props if they mention that prepared queries are better than the others).

Why? That’s just good security!

Security is not a impenetrable wall. It sits on top of a mountain surrounded by a decently-sized wall, with a moat in front of it, and a healthy number of alert guards on the battlements.

or:

Like an ogre, good web security has “layers.” ;-)

I can go for hours telling stories of people who haven’t understood this principle and paid the price. What you think is an unassailable fortress falls will fall like Dien Bien Phu to a clever attacker. In this case, if you chose just one security model, it’s easier to ask yourself about the following use cases:

  • What if it is the data field is the person’s name and he’s “Tim O’Reilly”?
  • What happens if at a later point, I decide to send the data somewhere (file, memcache, back to the user) before, or instead of, putting it in the database?
  • What if I migrate from MySQL to SQL Server, which has a different way of escaping?
    function mssql_escape_string($string) {
        return str_replace("'","''",$string);
    }
  • What if I migrate to a data store that doesn’t support prepared queries?

Software changes with business needs. Business needs change very often in the web world. Good software is flexible enough to change with it. (Exception: Ruby on Rails. In those cases, the problem isn’t the software, the problem clearly is you. If you don’t “get it,” then DHH has “two words for you…” :-D )

The principle here is “If the security protocol isn’t too inconvenient, always implement it and put it as early as reasonably possible in the application flow.”

More on this later.

Question: How would you create a single security audit point for injection attacks

People have already missed the earlier questions so I never ask this anymore. I’m tired of having your headhunter shit talk me behind my back to everyone in the Bay Area.

My answer would be a Data Access pattern. If you are a framework guy, you’d use a persistence layer (like ActiveRecord) to abstract yourself entirely from the database because, apparently, LEFT JOINs are is just too damn hard…

Given how many people bomb this series in interviews, I’m inclined to start agreeing with that statement.

Question: What is Cross-Site Scripting (XSS)? Cross Site Request Forgery (CSRF)? the Session Fixation attack? Describe an example of how you would implement the attack (and how to defend against it in your code)

Again, the reason for the end is to give an opportunity for the to apply this stuff in practice. To think concretely like an attacker shows real knowledge beyond the simple theory. Again, if you don’t understand the attack, you can’t defend against it. Again, these principles come from the practice.

Here is the definition of XSS, XSRF, and Session fixation.

I’ll confess that the only reason that I ask about session fixation is because, once in a while, I meet a candidate who can define XSS and CSRF easily, and Sadistic Terry wants to see if he can break them. Session fixation is the web security equivalent of asking people how to laugh in hexadecimal. It’s an obscure vulnerability easily-corrected and known by us old-timers. This makes it a fun one to lord over people. (How to laugh in hex: 48 41 48 41.)

CSRF is important on today’s Ajaxified websites. But I’ll continue with XSS (what it is, how to do it, how to protect against it) to hook off of later in this series.

Question: Explain how the MySpace worm works? Give an example of using CSRF to determine login state on a remote site?

I don’t ask these questions normally. But some of the more belligerent candidates bitch about the previous series as being “just terminology.” Basically this dismissal is the security/interview equivalent of “I’m not really into Pokémon.”

You better be into THIS Pokémon

Really do they think I ask these questions for fun?

If you can’t answer the above two questions, figure them out on your own. Understanding how to answer these is how you’ll get a zen-like ability to quickly understand security vulnerabilities to build security practice once you’ve taken the time to understand these simple security principles (in this case what and how to combine: XSS, CSRF, and Javascript exception handling).

(Aside: When I saw Chris Shiflett’s second twitter, I was really pissed. He found the same vulnerability in it as me. I was sitting on that vulnerability for months and was going to pass the time one day by writing a twitter worm.)

Question: What does “filter input-escape output” mean? Give an example of filtering? Give an example of escaping?

Here is Wikipedia’s definition, but put simply it is the principle that all filtering should be done on the input to the application server and escaping should be done out the output from the application server.

The problem is, even if people can intuit that or have heard of it (surprising few candidates have, but most can guess), many haven’t thought about what is filtering and what is escaping? But that’s why I asked the other questions because they already have can answer this!

Taking the two examples above (SQL injection and XSS) and applying this principle:

  • To filter SQL injection, you strip out the quote mark or cast the input as an integer (if you are expecting integers), etc.
  • To filter XSS attacks, you use strip_tags(), regular expressions, or a combination of html tidy normalization and a DOM walker that implements a white-list or black-list filter. This filters out the <script> tags as well as injections into CSS—you may need a CSS parser there unless you strip out all style attributes and <style> tags.
  • To escape SQL injection, you use something like mysql_real_escape_string() or prepared queries.
  • To escape XSS attacks, you use htmlspecialchars() or htmlentities(). Both will do things like replace < with &lt; but htmlentities does a bit more if you know the output is to HTML and not XML.

(It might have helped if we had called it “encoding” instead of “escaping.” But we don’t, so deal.)

And from this example, the rest follows.

  • You want to filter on the input, because of the practice as putting as much security as possible as early as possible in the application. The input is as soon as you can. This way all the stuff downstream will achieve the benefits of this protection.
  • Note that some candidates misinterpret “filter input” as putting it in client side code. That’s easily bypassed by the simplest script kiddie scripts/spam bots. They forget that they’re a PHP developer, not a front end developer: we are talking input into the PHP application not the output of the web browser—though it doesn’t hurt to put it there too.
  • You have to escape on output because escaping functions are going to be different depending on where the data goes. If it goes to the MySQL database, you use mysql_real_escape_string() which generates slightly different than when sending as an argument in the command line (escapeshellarg()) which is different from when outputting back to the user (htmlentities()). (Certainly you can see that XSS escaping looks totally different than SQL escaping!)

And the principles apply again and again to XSS as they did to SQL injection.

  • What if I want the HTML to be outputted as HTML because my MySpace-like site with HTML editing and customization? You can’t have any escaping, but you still have the filtering. And you’ve done that filtering on input right?
  • When you know it isn’t HTML, you should always escape against HTML strings on output to the template. And you have the freedom of strongly filtering on input by using strip_tags(). This way, you are protected against XSS injection back to the user. XSS forms the basis of many forms of session hijacking and XSRF worms, so that’s a good thing.
  • Therefore, you apply both security rules because no one is going to have complete coverage. And because you don’t want security to be an impenetrable wall, right?

And this leads to further understanding of the concept of “input” and “output.” Input doesn’t mean “input from the user’s browser” it means “input into the application server” and “output” doesn’t mean “output to the returned HTML” it means “output from the application server to any external source.”

PHP still does what it does best (act as glue code) but with the hard-won security principles tacked on. In the old days, this meant gluing the user to a database back end and back, but on a modern website, this means so much more. We’ve gone well beyond from a “3-tier architecture” or even “n-tier” to a complex architecture of highly cohesive external services with simple, standard interfaces, one of which is the user of the website via html (i.e. user via ajax payload, 3rd party api in XML REST, JSON, SOAP, XMLRPC, or binary protocols, memcache, database, command line, smtp/sendmail, back-end business objects, TCP, UDP, or direct calls via SWIG or extension or whatever.)

Aside: principle to practice

I’ll give you an example of how I applied the above principle of “filter input, escape output” at Tagged. Because we’re a MySpace-like social network, we have to base our input filtering of certain fields on a blacklist of illegal tags, properties, urls instead of a whitelist of allowed tags (more common of many libraries out there). I knew I could not “build an inpenetrable wall” with a blacklist—the spec is always changing and there are an “infinite mass of shit” in security. I also needed a system that needed to be flexible to business needs (in other words, I had to be able to poke very selective holes in my security model based on business deals).

So instead, I did what I had time to do (write a custom html input filter), and then applied the principle of filter input-escape output to the system architecture. But not just with a kick-ass user input filter (filter input); not on user output (remember, I can’t escape the HTML in this case); not just on escaping to the database (via stored procedure to Oracle); but also on input from the database and memcache stores.

In other words, I encoded the version number of the HTML input filter on output to the database and memcache, so on input, I could check to see if it was out of date and run the html filter again! Why?

Because I knew that one day, we would be were hacked. Within hours a XSS worm injected into the style tags of the widgets on our site had “infected” 60,000 user profiles. Even if we stopped it on user input, we still would have the 60,000 user profiles to deal with, across all the databases in the federation containing 50 million user profiles would have taken days to clean the mess!

Instead, I asked for a copy of the exploit, figured out the nature of the attack, added a CSS parser (which I had lying around because I knew about this attack, but I was too lazy to look up its exact nature), hooked it up to the html filtering object and bumped the version number.

All new uploaded content was filtered against it. And as users used the site, they were fixing it the site in memcache. Then, at our leisure, we could test the new filter against regressions and slowly remove the attack permanently from the database. All using the same code and all because of the principle:

Filter input, escape output. Know the difference between filtering and escaping. And we mean all inputs and all outputs

And I did this without building an “impenetrable wall,” but a series of barriers, moats, and what not—every input an opportunity to filter, every output an opportunity to escape, the flexibility to deal with breaches designed in the architecture.

See?

Layers. Like an ogre!

Instead of panicking or a site outage, two hours later, I was headed to a party and getting royally wasted.

Question: What are “Magic Quotes” and why is it bad?

This is the gravy. Many of you know, “Magic Quotes are bad.” But did you know that they’re monotonically bad? (For instance, a case can be made for keeping register_globals, but none can be made for magic_quotes.)

And can you answer this question this simply?

“Magic Quotes are bad because it escapes input, and you should filter input, escape output!”

Now you can!

Hard experience has taught people developing large PHP sites that you should escape only on output. Anyone who has written PHP code that is deployed on scale or on a variety of hosted services know of the nightmare that is magic_quotes. (Sure, you could have escaped on input, but you would have to know what escapes were applied and carried it with the variable as metadata—most commonly by changing the variable name—it gets quickly very ugly.)

Put simply:

magic_quotes has the implicit assumption that the output of all input is a mysql/postgres database or command line and the attacker is not that clever.

Which brings us full circle to the Perl developers who continue to argue with me that I can (and should) escape the input. Perl is “311 code”—chmod 311 *.pl- writer can write and execute, the company and the world can execute, nobody can read it!. TIMTOWTDI means the Perl coder simply isn’t used to the concept that their code will be broken into parts and edited by a team of people.

While PHP code may have had its origins in “Rasmus wants to build his Personal Home Page and he wants a template tool to do it,” it now has to power large scale, complex websites such as Yahoo! and Facebook (and Tagged).

Code has become complex. magic_quotes made sense when we we were naïve about inputs and outputs (input meant from the user and output meant to the database); magic_quotes made sense when we didn’t really understand the difference between filtering and escaping.

But many PHP developers spent many years trying to pull things like magic_quotes out of the system from the wreckage of their code created many late-night debugging sessions and broken Sourceforge PHP application installs.

Are you going to learn from their experiences, or are you going to doom yourself to repeat them?

And that is the nature of security:

Practice creates good principles. Those principles make for good practice.

Now we’re done

And after this, honestly can you see why I don’t understand why I have headhunters shit-talking me behind my back? I guess they really want websites to fall flat on their asses and to be poorly-architected, vulnerable piles.

See that picture of me at the top of this entry? I’m the PHP Security Grinch and I’m here to tell you, my heart isn’t going to grow three sizes this season and saying my questions are too hard. Bah! Humbug!

Why? Because, I actually want your website staying online, I want you able to enjoy this holiday season without fearin being “on call” for a late night security patch session.

Parting shot

Returning to the statement that “Security is a luxury” that touched off this entry…

Before one interview, a candidate’s resume listed web security under skills. Of course they got this battery of questions from me, and, as luck would have it, it was especially egregious. I was about to move on but the candidate, clearly frustrated by this experience, said to me, “Look, if you give me a website, I can make it secure.”

I was crestfallen—the person didn’t know what a SQL injection attack was! Then, I realized, Wow! He’s right. Shit. I can make a web server unhackable too: just disconnect it from the internet.

So I suppose this is as good an Advent tip as any:

If you absolutely have to make your website secure this Advent, go to the colo and pull out all your ethernet cables.

Happy Holidays!

Update: the release post. This entry is unlocked.

8 thoughts on “Filter Input-Escape Output: Security Principle and Practice”

  1. Really great explanation of the filter/escape concept. Funny too! I really enjoyed reading it. As far as I can see it all holds true today. Do you agree, or has the security picture changed?

    1. Thanks for reading!

      I think filter-input/escape-output is a security principle.

      As Stephen Covey might say, the reason you should be principle-centered is that basing yourself on other things (family, work, self, the PHP language) is subject to the vagaries that those things might and will change. But your principles do not. :-)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>