Alejandro Gervasio has an excellent article on using polymorphism to create a database abstraction layer.
Basically if you are wondering why or how PEAR DB, MDB, ADOdb, or PDO use the Factory pattern to provide database abstraction this walks you through it.
Maybe five years ago, this was a mind-blowing idea. But perhaps we should call into question whether this is an unnecessary abstraction.
[Database abstraction vs. Data Access after the jump]
Database abstraction
The only good purpose of database abstraction is to provide a uniform interface for database access. Instead of having to remember that the server, username, password ordering is different between mysql_connect(), or oci_connect(), you can just use new PDO() for all of them. Similar things hold for prepared queries, binds, and transactions.
I mention this because that convenience has followed my predilection for database abstraction from PEAR DB in 2000, to ADODB in 2004, to the in-vogue PDO today. I still like it and sometimes I wish they’d go even farther (like correctly abstracting the date function and date type information for me).
The seedy underside of database abstraction
Many people use a database abstraction layer for database portability and independence.
How far can you go with portability? Honestly unless you go to really heinous extremes, your code isn’t really going to be easily portable after a month using only database abstraction. Any of you who have had to do this sort of migration know what I mean. Let’s look at the SQL LIMIT clause in the example given:
SELECT * FROM table FETCH FIRST 10 rows ONLY /* DB2 */ SELECT FIRST 10 * FROM table /* Informix */ SELECT TOP 10 * FROM table /* SQL Server */ SELECT * FROM table LIMIT 10 /* MySQL and PostgrSQL */ SELECT * FROM table WHERE ROWNUM <= 10 /* Oracle */
(The astute reader will notice that my Oracle query is shorter than the one in the example I quote. That is because I take a shortcut as will be explained in a future article).
And beyond that, let’s say you want database independence and not just portability. You would not only have to be forced to using strange abstractions like ADODB->SelectLimit(), abstracted transactions, etc., you’ll have to avoid using RecordCount(), subqueries, stored procedures, or just about any interesting feature of the database.
You may have not noticed this because PHP developers have been advocating database independence via its hit man, database abstraction, for as far back as I can remember. And the M in LAMP has been MySQL 3, one of the simplest database known to man, and—just as fortunately—well tuned for web development in general (fast SELECTS, LIMIT clause) and PHP development in particular (lightweight connections).
But I wonder how much of this is becoming self-selection: How many developers out become PHP developers because they secretly despise the database development the way a Java programmer despises procedure-oreinted code? How many developers choose MySQL simply because they develop in a database independent manner? How many developers advocate database independence because all they know is MySQL 3? (Or, put another way, how many times have you heard a PHP developer lament “database X doesn’t have the LIMIT clause” when, in fact, Database X does, it’s not just called LIMIT and that developer just doesn’t know it.)
Don’t get me wrong: I use database abstraction when I have the opportunity. And, MySQL is my favorite database. As Adam likes to joke, “MySQL and PHP put the mmp in LAMP.”
But even I have to wonder: Do I only use database abstraction because, like a friend said to me in graduate school, “Terry, you’re so lazy you’d walk a mile to save a yard?” Do I use MySQL because it is like a really good whore: easy, fast, and cheap?
Database dependence
Over the next year I’m going to try to convince you why database dependence can be a Good Thing™. I don’t know exactly what I’m going to write about, but since Chris Jones has been bugging me to write more about OPAL (Oracle PHP Apache Linux), I suppose I’ll do it from approaching things using Oracle.
(Maybe I’ll get better deals from Oracle if I keep at it Or OTN will publish some of these articles and I can buy yet another digital SLR camera.)
This doesn’t mean that if you use MySQL there is nothing to learn from these posts. MySQL has come a long way and there some cool new database dependent tricks that are popping up in MySQL development. The nasty side of talking in a database-dependent way is that a trick I talk about for Oracle won’t necessarily have a corollary in MySQL or DB2. But I hope reading these (hopefully not completely boring) articles will get you thinking about examining your favorite database for some database dependent gems of your own.
Heck, I’ll be the first to admit that while MySQL is my bitch, the company I work for uses Oracle, bought at paid for. And, I don’t like buying commercial software if I use it in a manner that is nothing better than the free and open source counterpart. Nature abhors a vacuum, and so do I, especially when someone is paying for it.
Data Access and Security
I’m going to advocate Data Access instead of (or on top of) Database Abstraction. This does two things: it increases security and preserves portability. (The consequences will be extra abstraction and a lot of extra code.)
You may remember that an interview question of mine is “How do you create a single security audit point for SQL injection attacks?” An answer is Data Access Objects.
Data Access is a very easy concept. Basically it is simply: “Restrict access to the database to an interface (or set of interfaces) to that layer.” Let’s say you are building an eCard site on top of a contact management site, you might create a data access object that looks like the following:
class plx_dao
{
protected $_dbh;
function __construct($dsn, $user='', $password='', $options=array())
{
$this->_dbh = new PDO($dsn, $user, $password, $options)
}
protected function _getAssoc($sql, $bindings)
{
// TODO: no error handling
$stmt = $this->_dbh->prepare($sql);
foreach ($bindings as $idx=>$value) {
for ($i=0,$max=count($bindings); $i<$max; ++$i) {
$stmt->bindParam($i,$bindings[$i]);
}
$stmt->execute();
return $stmt->fetchAll(PDO::FETCH_ASSOC);
}
}
class plx_dao_ecard extends plx_dao
{
const select_sql = 'SELECT * FROM ecards ORDER BY popularity,DESC LIMIT ?,?';
public function getEcards($start,$max_num=20) {
return $this->_getAssoc(self::select_sql, array((int) $start-1, (int) $max_num);
}
}
Now an audit is very easy to perform. You can audit all the SQL for a given application by looking at its data access object’s constants. You can look in the methods themselves to make sure you are correctly casting all the bind parameters. You can even filter or escape the outgoing data from the methods (though you should probably leave escaping for your template layer).
Without a Data Access Object, you have to look at your entire codebase for the database calls.
Data Access is not Data Object
You may think this look a lot like Data Objects but actually they’re polar opposites. The idea of a Data Object approach—an ORM similar to ActiveRecord1 which Ruby on Rails uses—is the database rows should be made (almost) homomorphic with the object layer. This is an entirely database independent abstraction.
The nice thing about Data Objects is the interfaces to them are well-defined. These actions have a one-to-one correspondence on the database on the back end. This fits right into the convention over configuration model of Rails development.
Data Access Objects are configuration over convention. In practical terms, a big consequence of Data Access Object is that every interface with the database has to be specified. Nothing is automatic like in a Data Object.
On the other hand, there is nothing particular about a Data Access Object from creating an interface that does a multi-table join—something very difficult to do in the case of a Data Object. And, much more importantly, it doesn’t perform like dogshit and you are free to take advantage of database dependent optimizations.
Data Access as an assist on database depedence
You can imagine that instead of returning the rows as is with
return $stmt->fetchAll(PDO::FETCH_ASSOC);
I could be fancier and manipulated the data some more in the client
$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
foreach ($results as $idx=>$result) {
$tmp_fname = tempnam('/tmp','DBcrap_');
file_put_contents($tmp_fname,$result['data']);
list($results[$idx]['width'], $results[$idx]['height'], $image_type) = getimagesize($tmp_fname);
$results[$idx]['thumbnail_url'] = sprintf('%s.%s',$result['picture_id'], image_type_to_extension($image_type));
unset($results[$idx]['data']);
unlink($tmp_fname);
}
return $results;
If this logic were moved to a stored procedure, you wouldn’t have to send an entire BLOB down just to generate the width, height and mimetype. But here is the thing, if you created the stored procedure, you only need change the code inside plx_dao_ecard::getEcards() to support, without a single change in any of the client classes. A Database Developer can see nice targets for stored procedure calls simply by looking at the functional spec of the data access objects themselves, instead of the PHP developers entire codebase, or blindly finding relationships from their query logs.
Another example might be when you are paging output. Certain abstractions like SQLRelay, have the capability to suspend sessions to access cursors across independent PHP instances. To do this, simply store the cursor into an internal property of a particular DAO. On __sleep() suspend the connection and have the cursor serialized with the session. On __wakeup() you can reconnect to the same database session and get the next page without having to run a new query.
Data Access and portability
While you are certainly and obviously not database independent, Data Access does assist in portability.
First, you have the flexibility of porting one group of tables at a time (the different DAOs will, if designed right, have few or no connectivity to other tables).
Second, you’d only have to port stored procedures and database dependent features one at a time. You would not have to audit the entire site to find all the database dependent calls because they are in one place.
Most importantly, if you no longer have access to database dependent features during a port, those features can be emulated back in the data access layer. Later your Database Developer can look for different optimization approaches.
I hope this explains why I think Data Access Objects are a necessary abstraction on large scale PHP sites because they compliment Database Abstraction without creating database independence, provide audit points for security, give the database developer adequate targets for optimization, and assist in portability.













