Importing Drupal 6 nodes into a Drupal 7 site

Note: I wrote this back in August of last year but never published it. I never finished writing this up either, but one can check out the forum link below for some more information

So, with the Drupal 7 alpha releases available, I decided to give a try upgrading a Drupal 6 site to Drupal 7 (Alpha release 6). I had already built a little Drupal 7 site for a local school activist group, but that was back with Alpha 2 and at that time I decided to wait a bit before messing with 7 again. The goal was to upgrade my blog at http://www.nowarninglabel.com and move it to its new home at http://www.coderintherye.com

So, in I plunged into what I thought may be a fairly simple upgrade path. 60 blog nodes and 5 page nodes, no problem right? Wrong! I found it impossible to upgrade my D6 site to D7 on the host where it exists. Errors abounded, and I gave up on the 2nd night of trying to squash them. I will say that it is possible many of the errors were due to my host only providing 32MB of memory for PHP. However, in D6, in cases of running out of memory, the error was usually obvious that it was the case, or you had a white screen of death. In this case, I experienced neither, only cryptic error messages which I would track down, change some lines, rebuild the database, and experience the same problems over again. I'll give upgrading a D6 to D7 site another go when the beta releases are out and the migration path is clear.

So, option 2 was exporting the nodes and importing into a D7 site. Easy right? Well no, not exactly. There are no stable releases yet that provide import/export functionality (as of August 20th, 2010 that I know of) between a D6 site and a D7 site without coding. The Migrate module which may offer that ability, but requires coding up somewhat complex migration files. So, the next option was to do a manual export/import. The biggest hurdle to overcome was understanding the new Field API for D7. The Field API is a great piece of work and supersedes CCK from D6. However, it has a lot more interoperability than CCK alone had. This leads us to the first important change:

There are no longer functional body and teaser attributes to nodes in Drupal 7. These are now handled by the Field API. In short, body and summary now act as true fields (as if you had added them via CCK in D6).

Because of this change it is important to properly address those fields in the $node object when calling node_save() to create your new nodes. This is where I went wrong and thrashed away for an evening, before giving up and manually importing my SQL export data from my D6 blog. This leads us to the 2nd import change. There are numerous new tables in D7, but the two that concern us are field_data_body and field_revision_body. These two tables will contain the body and summary data that was previously contained in the node and node_revisions. You will use all 4 of these tables to import your D6 data into D7. I handcrafted the SQL to import into the new tables, but now thanks to some helpful advice from lorindpa in the drupal.org forums I now know the proper code to use.

So, where is the change? Basically in order to properly import now, once has to specify the node language. If you are not using locale and haven't defined a site language, then this can be 'und' which is short for undefined. If you are using locale then it would be the shortname, for instance 'en' for English.

So, now how can we get from node export on D6 to node import on D7? The steps below will get you there with some tweaking, but ideally we can patch the node_export and node_import modules (or import/export, make Migrate more user friendly) in order to do the process more easily.

  1. Export your node_revisions table from your D6 site as a CSV. (Or any format you want, but easier to use CSV in my opinion. Perhaps one could use node_export here as well to initially export the node data though I have not tried).
    • You can export your node_revisions table by either opening up your database in PHPMyAdmin, select the node_revisions table, and then choosing export, then choose CSV in the options. You will need to choose a row delimiter, I chose '___' and left the field delimiter as the default semi-colon ;
    • Alternatively, you can export from the command line, e.g., mysqldump -u root -p drupal6 node_revisions > drupal6.node_revisions.sql --fields-terminated-by=';' --lines-terminated-by=___ --tab=/tmp
    • Note in the above command you need to change your database and username to be the ones you use. needs will export a semi-colon delimited file as node_revisions.txt in /tmp
  2. Once you have the exported nodes, you just need to clean them up so you can use them in some code. I prefer to use Notepad++ or SciTE (both of which use Scintilla as the backend) to do some regex in order to make the data easy to import.
  3. Here is an example of how one might do the import:


    <?php
    main();
    function main() {
    $nodes = array();
    $nodes[1]['title'] = 'First post';
    $nodes[1]['body'] = 'A nodeA node which is a blog post with lots of stuff';
    $nodes[1]['teaser'] = 'A node';
    $nodes[1]['timestamp'] = 1281765101;
    $nodes[1]['format'] = 2;
    // about 65 more nodes that I don't want to paste here
    make_nodes($nodes);
    }

    function make_nodes($nodes) {
    foreach($nodes as $new_node) {
    $node = new stdClass();
    $node->type = 'blog';
    $node->status = 1;
    $node->uid = 1;
    $node->title = $new_node['title'];
    $node->promote = 1;
    $node->created = $new_node['timestamp'];
    $node->timestamp = $new_node['timestamp'];
    $node->sticky = 0;
    $node->format = 3;
    $node->language = 'en';
    $node->body['und'][0]['format'] = 3;
    $node->body['und'][0]['summary'] = $new_node['teaser'];
    $node->body['und'][0]['value'] = $new_node['body'];
    $node->revision = 0;
    node_save($node);
    }
    }

    Now you may be wondering, what about importing images. Well, what I decided to do was use wget to download the images and move them to the new file directory and querypath to pull the src and alt attributes for the images. Something like the following:


    $companies[$company_name]['logo']['src'] = htmlqp($company, 'img')->attr('src');
    if(strpos($companies[$company_name]['logo']['src'], 'http://') !== FALSE) {
    $current_dir = getcwd();
    chdir($file_path);
    exec('wget ' . $companies[$company_name]['logo']['src'] . ' -O ' . basename($companies[$company_name]['logo']['src']);
    chdir($current_dir);
    $companies[$company_name]['logo']['uri'] = $file_path . base_name($companies[$company_name]['logo']['src']);
    }
    else {
    $companies[$company_name]['logo']['ur'] = $companies[$company_name]['logo']['src'];
    }
    $company_logo_alt = htmlqp($company, 'img')->attr('alt');

    And then in your node import code you can do something like the following:


    // Begin image save code
    $file_temp = file_get_contents($company['logo']['src']);
    $file_temp = file_save_data($file_temp, file_default_scheme() . '://supporter/logo/' . basename($company['logo']['src']), FILE_EXISTS_RENAME);
    $node->field_supporter_logo['und'][0]['fid'] = $file_temp->fid;
    $node->field_supporter_logo['und'][0]['uid'] = 1;
    $node->field_supporter_logo['und'][0]['alt'] = $company['logo']['alt'];
    $node->field_supporter_logo['und'][0]['filename'] = $file_temp->filename;
    $node->field_supporter_logo['und'][0]['filemime'] = $file_temp->filemime;
    $node->field_supporter_logo['und'][0]['filesize'] = $file_temp->filesize;
    $node->field_supporter_logo['und'][0]['uri'] = $file_temp->uri;
    $node->field_supporter_logo['und'][0]['timestamp'] = time();
    $node->field_supporter_logo['und'][0]['status'] = FILE_STATUS_PERMANENT;
    // End image save code

    Well I hope that provides some thoughts for how to go about this. It's all rather rough as I never fully formulated my thoughts on this process, though I did get it to work for myself in various situations. Indeed it led me to the start of an idea, http://converthtmltocms.com/ which I had hoped to turn into an easy way for people to import legacy sites into a CMS system. Maybe one day I'll fully flesh that out as well as coming back to this blog post, but in the meantime if you have questions or corrections, please leave a comment.

Comments

using your code

hi, thanks for the info, however how do you use your code? in Notepad++? php scripts? thxs

undefined function error

Hi,
I went through all these steps, checked my syntax, but when I enable my module I get a fatal error: Call to undefined function node_save()
As a result my imported blog nodes seem to exit but have no bodies.

My guess here is what you are

My guess here is what you are doing wrong is not using drush to run the script. You should do:
"drush scr scriptname.php" to run it, so that it will bootstrap drupal and the node_save function will be available.

further to my last comment....

The missing content turns out to be that the text format field did not get set. I might have messed that up - I was modifying things such as retaining the original uids (I'd already imported all the users)

How would I go about importing comments to the blog posts?

For comments, you just need

For comments, you just need to first import the comments themselves, using a similar method to above, but with comment_save, then make sure you associate them to the nid of the nodes you saved. So easiest way to do this, is if you do it together, and thus can pass the nid to comment_save when saving the comments directly after saving the nodes.

If you separate the two actions, then one easy way is using the node title to get the nid, use node_load_multiple http://api.drupal.org/api/drupal/modules%21node%21node.module/function/n... with title as the condition and thus you will get the nid of the node and can pass that to comment_save.

Add new comment

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

Comment using an existing account (Google, Twitter, etc.)