Drupal 6 vs Drupal 7 Database Primer - Part 2

10 Oct in api, drupal, drupal 6, drupal 7, pdo, planet-drupal, transactions
Printer-friendly version

In part 1 of this series of articles, I discussed D7’s new database layer and we started looking at what to expect with examples of static and dynamic queries. Those queries are for fetching information from a database, but what’s changed from D6 to D7 when it comes to manipulating the information in our database? The answer: a lot.
 We’ll start with INSERT queries. In D6 we were relegated to two options when it came to INSERT queries. We could either write out an INSERT query the long way:

<?php
  $query = "INSERT INTO {node} (title, uid, created) VALUES ('%s', %d, %d)";
  db_query($query, 'Example', 1, $timestamp);
?>

Or, we could use the function drupal_write_record():

<?php
  $record = new stdClass();
  $record->title = 'Example';
  $record->uid = 1;
  $record->created = time();

  drupal_write_record('node', $record);
?>

This was a vast improvement over D5 certainly, but could still result in some cumbersome code. Looking back to the first article in this series I discussed how D7's database layer is now built on top of PDO. Let's see how we handle an INSERT query in D7 utilizing this new API.
 In D7, first we need to create an insert object:

<?php
  $query = db_insert('node', $options);
?>

Looking at the D7 handbook for INSERT queries we learn that this "query" object uses what is referred to as a "fluent API", meaning that all methods – with the exception of the execute() method – return the query object. That means we can chain our methods together and keep our insert queries compact:

<?php
  $nid = db_insert('node')
    ->fields(array(
      'title' => 'Example',
      'uid' => 1,
      'created' => REQUEST_TIME,
    ))
    ->execute();
?>

Nice. As you can see, we pass the db_insert() function a table name, then we chain the fields() method to it. We pass the fields() method an associative array (it can actually accept several different types of parameters, as we'll see shortly), where the array keys correspond to our table's column names. Finally, we chain the execute() method to perform the INSERT query. The execute() method, as mentioned earlier, does not return the query object, but instead returns the value of an auto-increment field in the table for the record just inserted. As the D7 handbook states, if the table does not have an auto-increment field the value that execute() returns will be undefined and "should not be trusted". Hence, in the example above, we assign the return value of the execute() method to a variable named $nid. Makes sense to me.

The last example is great for inserting a single record into a table. But, more often than not, we are going to want to insert multiple records into a table in one go. Remember that the fields() method can accept different types of parameters. We can pass the fields() method an indexed array of column names and then chain another method to it: values(), which takes an associative array. This method allows us to create several records at one time. Another example from the handbook should make this clearer:

<?php
  $values = array(
    array(
      'title' => 'Example',
      'uid' => 1,
      'created' => REQUEST_TIME,
    ),
    array(
      'title' => 'Example 2',
      'uid' => 1,
      'created' => REQUEST_TIME,
    ),
    array(
      'title' => 'Example 3',
      'uid' => 2,
      'created' => REQUEST_TIME,
    ),
  );

  $query = db_insert('node')->fields(array('title', 'uid', 'created'));

  foreach ($values as $record) {
    $query->values($record);
  }

  $query->execute();
?>

Pardon me while I pick my jaw up from the floor. How cool is this? We create a $values array that contains the records we want to insert, again with keys corresponding to column names. Then we create our query object and inform it of which table and fields we will be inserting data into. We then loop through the $values array, passing each record to our query object's values() method. Finally, with one call to the execute() method we insert three new rows into the node table. Sweet and simple.

We can also insert data based on a select query. For brevity's sake, I will refer you to the handbook page to see this in all its glory, but in essence you simply build a select query just how we did in the first article of this series, then we pass that select query object to another method of our insert query object: from(). http://drupal.org/node/310079

UPDATE queries follow a similar convention as INSERT queries in D7. As a quick aside, in D6, there is an "issue" with using drupal_write_record() in that you have to retrieve the value of the primary key of the table row you are wanting to update. Otherwise, the UPDATE query will silently fail. Let's assume we're performing an UPDATE query in D6 from within hook_nodeapi():

<?php
  // this will not work as expected
  $record = new stdClass();
  $record->title = 'New Title';

  drupal_write_record('node', $record, 'nid');

  // this does work, however
  $record = new stdClass();
  $record->title = 'New Title';
  $record->nid = $node->nid;

  drupal_write_record('node', $record, 'nid');
?>

I, like many other module developers, had to find this out the hard way. Not really a "big" issue, but an annoyance nonetheless in that the UPDATE query silently fails. Silent failures === beating your head against the wall until you trial-and-error out the issue, which all equates to a unneeded loss of time and productivity. But I digress.

As stated, D7 UPDATE queries are nearly identical to INSERT queries with a few caveats. As with INSERT queries, we begin with creating a new query object by calling the function db_update():

<?php
  $query = db_update('node', $options);
?>

This object, just like an INSERT query object is chainable, too. This object also has a fields() method where we set our column names and corresponding values, but unlike our INSERT query object fields() method, an UPDATE query object's fields() method can ONLY accept an associative array. We inform our UPDATE query of which row or rows to update by chaining the method condition() to the object, and finally perform the query with the execute() method. An example from the handbook:

<?php
  $num_updated = db_update('node')
    ->fields(array(
      'uid' => 5,
      'status' => 1,
    ))
    ->condition('created', REQUEST_TIME - 3600, '>=')
    ->execute();
?>

For more information on the awesomeness that is the condition() method, you can check out the handbook page on it here: http://drupal.org/node/310086

An UPDATE query object's execute() method in this case returns the number of rows affected by the query. However, this does not equate to the number of rows "matched" by the query. As the handbook tells us:

"The execute() method will return the number of rows affected by the query. Note that affected is not the same as matched. In the above query, an existing record that already has a uid of 5 and status of 1 will be matched, but since the data in it does not change it will not be affected by the query and therefore not be counted in the return value. As a side effect, that makes Update queries ineffective for determining if a given record already exists."

Thanks go out to the Drupal documentation team for saving me that potential future headache.

DELETE queries are nearly identical to UPDATE queries except we call the function db_delete():

<?php
  $num_deleted = db_delete('node')
    ->condition('nid', 5)
    ->execute();
?>

The execute() method for DELETE query objects returns the number of rows of deleted as a result. Not much else to cover on that front.

The last type of query I'll cover here are called MERGE queries. You may have also heard this out in the field referred to as "UPSERT" queries. These queries are a hybrid of INSERT and UPDATE queries, in that if a given condition is met either an INSERT query will be performed or and UPDATE query will be performed. This probably has me more giddy than any of the other type of queries because I can recount numerous times I have had to write long if/else statements for performing this task.

In its simplest form, MERGE queries look like the following:

<?php
  db_merge('example')
    ->key(array('name' => $name))
    ->fields(array(
      'field1' => $value1,
      'field2' => $value2,
    ))
    ->execute();
?>

We use the function db_merge() and pass it a table name. We chain the key() method which accepts an associative array whose key is the a key field in the table and we assign it a value of $name. We then chain the fields() method which, as in INSERT and UPDATE queries, contains the actual data we will be manipulating on which columns. And finally, we call the execute() method. In this example, if a row exists in our table whose 'name' column contains the value of $name, an UPDATE query will be performed setting 'field1' and 'field2' values appropriately. If there is NOT a 'name' field with the value of $name, a new row will be created with the value of $name we passed to the key() method.

For. Real. We can also get more granular with the conditions we want to apply to determine if we perform an UPDATE or INSERT query. Again, for brevity's sake (since this can get a little complicated) I will refer you to the handbook page on MERGE queries: http://drupal.org/node/310085

One final topic I would like to cover, as it is very relevant to performing these types of queries, is transactions. D6 and below had ZERO support for transactions and, as such, we module developers had to be very diligent in preventing data loss should a query fail halfway through execution. This typically meant lots of cumbersome error handling code, and even then the chance of data loss was still a very real risk.

In D7, transactions are very simple to implement. We start a new transaction by calling the function db_transaction(). Here's the example given in the handbook:

<?php
function my_transaction_function() {
  // The transaction opens here.
  $txn = db_transaction();

  $id = db_insert('example')
    ->fields(array(
      'field1' => 'mystring',
      'field2' => 5,
    ))
    ->execute();

  my_other_function($id);

  return $id;
  // $txn goes out of scope here, and the entire transaction commits.
}

function my_other_function($id) {
  // The transaction is still open here.

  if ($id % 2 == 0) {
    db_update('example')
      ->condition('id', $id)
      ->fields(array('field2' => 10))
      ->execute();
  }
}
?>

As you can see, when the variable $txn goes out of scope (or is destroyed as far as PHP is concerned), the transaction is committed. Until that happens, the transaction will not be executed. For the future of Drupal and module developers alike, this means we can rest a little easier at night knowing that data loss has been greatly minimized should things go wrong (like your DB server unexpectedly goes down).

I, for one, am just fine with that.

The next article in this series will cover some of the new hooks in D7 that have me very excited. Hooks like hook_page_alter() and hook_filetransfer_backends() are going to change the Drupal landscape greatly, I feel, and I can't wait to get my hands on them.

Until next time, happy coding.

Comments

Most of the time you don't want transactions.

95-99.9% of the time you don't want to use transactions.  For three reasons:

  • For small database opperations (inserting just a couple rows), the cost of setting up and committing the transaction is significantly more expensive than just doing the insert. 
  • When performing complex inserts that take a lot of time the database will lock any rows in question and prevent other queries from accessing the rows until the transaction is complete. 
  • With the web, having high availability is usually more important than potential data loss.  Having code written in such a way that a comment could potentially disappear on its way into the database is not that bad, having page loads blocked by excessive database transactions is less acceptable. 
Bob Marchman's picture

Up for debate

Thanks for the comment, but I have to respectfully disagree with you here. I believe 95-99.9% of time we SHOULD be using transactions. Here's why (IMHO):

  • You are correct in that transactions do come with more overhead associated. However, I believe when comparing this overhead to the potential overhead of having to mitigate data loss, this becomes negligible. I ran a quick and dirty benchmark just to see how great the difference actually is. I wrote two functions that manual insert 10,000 new rows into the node table. One function performs this as a transaction, the other does not. The transactional query took 13.729 seconds whereas the non-transactional query took 5.542 seconds. About an 8 second difference for 10,000 rows. So, there is no argument that transactions cost more. However, I've had to spend more than a day's worth of work cleaning up a database after a server outage. While Drupal does lock db tables, if an outage occurs, someone still has to unlock those tables and clean up any data that only got half-written/deleted from the database when the outage happened. No bueno. I'll take that 8 second difference any day over having to deal with that.

 

  • Yes, high availability is very important, but that availability is moot if you end up with a page load that is attempting to retrieve bad data from your database. Transactions minimize the chances of white screens or erroneous data. Also, I think it's more relevant to look at the type of data being stored in your database and not how much, when ultimately making a decision on whether or not to use transactions. I may only have to insert data into the db once a day, but if that data is of a very sensitive nature, or if my site is dependent on that data to actually "work", then yes, I want to use transactions every time. There are implications here beyond performance. There's also legal and monetary implications. Performance sometimes has to take a backseat. Besides, I would rather sleep easy at night knowing my data is safe and tune my site in other areas to compensate accordingly.

 Just sayin' :)

Another small note :)

The really awesome thing about multi-row inserts is that it will automatically use a single query on supported databases and fall back to a query per row if they don't. So it is finally possible to insert many rows in a single query without leaving databases that don't support this behind :)

Patch for 6.x

There is a patch for 6.x to support multi-row inserts; http://drupal.org/node/512962#comment-3340052 (Optimize menu_router_build). To find other multi db opperations in 6.x take a peak into the code of the boost module http://drupal.org/project/boost.