Migrating 4,000 Image Posts from Tumblr to WordPress
Last spring, I imported my entire Tumblr archive to this website. I’ve been sitting on this post though, because it’s way more technical than my usual, and for most people, this task should actually be pretty easy… WordPress’s default importer tool supports Tumblr and is pretty intuitive to use. I think this was the case even before Automattic acquired Tumblr.
For basic WordPress setups, the default importer should work fine, though you may need to run it a few times to go through thousands of image posts.
For everyone else, please learn from my mistakes. :’)
Note: You’ll get the most out of this post if you have working knowledge of WordPress infrastructure and mySQL.
Tumblr Import to Custom Post Type
I had a fresh WordPress install on my local computer (XAMPP server), and the Tumblr import process for 4,000 mostly-image posts went flawlessly. Images were all downloaded and metadata (dates and tags) was preserved without issue. The import took about a day, but I don’t remember how many hours it was total.
All of my problems arose after this initial import because I’m picky and particular and also have too many disparate pieces to my website.
I didn’t import the Tumblr archive directly to my live site because 1) it’d have been a bad time if the import went poorly, but 2) the default importer only supported importing items as posts.
Since it began as a blogging platform, WordPress by default supports two1 post types — posts and pages. Posts are part of the blog. Pages make up the rest of the site. Tumblr is also a blogging platform and follows a similar format with posts and pages. I’ve always called my Tumblr a sketchblog because it was basically just a visual dumping ground instead of a textual dumping ground.
Because I use the blog on my site as an actual blog, I didn’t want the mix the blog-blog with the sketchblog. So the Tumblr archive would need to be imported as a different post type. You can create and define as many new post types as you want with WordPress, so I planned to import the archive to the live site as a “sketch” post type.
1 “Posts” and “pages” are the two default post types that are obvious to users, but WordPress also stores a bunch of other things in its “posts” table, including attachments/images, revisions, and more.
Famous Last Words
Theoretically, this should’ve been straightforward:
- Import Tumblr archive to Local Install using default importer
- Define custom post type sketch in site theme
- This can be done using a plugin, too
- Change all imported items from post type post to post type sketch
- This can be done using a plugin, but that would’ve been semi-manual, page-by-page conversion
- Since I imported into a clean install, I just ran
UPDATE wp_posts SET post_type= REPLACE(post_type, 'post','sketch');
- Do some tag cleanup and categorisation (Tumblr has tags but not categories)
- Fiddle around with layout stuff related to the new post type
- Update Live Site to define post type sketch and related layout files
- Export archive from Local Install via default exporter
- Import archive to Live Site via default importer
Unfortunately, it turns out WordPress’s default WordPress-to-WordPress importer sucks. Part of this is the well-documented issue of trying to import a huge number of images to a site. The Tumblr archive comprised ~4,000 posts, but many posts had multiple images attached. I should’ve counted beforehand, but let’s say it’s like 6,000 images.
Once the archive was imported to the Local Install, this ballooned five times to like 30,000 images because WordPress auto-generates thumbnails and cropped sizes for different uses around the site. So the import to the Live Site involves uploading 30,000 images.
The default importer basically failed to import any images due to server load, but it also mangled the 4,000 posts and somehow imported 6,000 posts? But the duplicates were scattered everywhere and difficult to parse out. It didn’t help that Tumblr image posts don’t have titles, so 95% of the archive is untitled, which means most “find duplicate posts” tools don’t work, since they rely on unique titles. (I guess comparing actual post content is more difficult and load-heavy.)
I repeated the import a few times trying to get it to work, purging the sketch posts after each failure with
DELETE FROM wp_posts WHERE post_type = 'sketch'; and then using DB cleaner to purge associated metadata.
The Manual Route
Eventually, I decided to just go the manual route, which is to export and import data via mySQL and to manually transfer all the images via FTP.
This is the part where I realise I should’ve really imported the Tumblr archive to a cloned, staging copy of the Live Site instead of to a fresh install. I could’ve still mass-converted imported posts into sketches using a different criteria like
WHERE ID > whatever the largest post ID was before the import or something.
But since I imported into a clean install, I now had a bunch of ID conflicts between the Local Install and the Live Site, since I obviously have a bunch of existing posts on the Live Site. All posts, regardless of type, are stored in the
wp_posts table, so IDs are unique across all post types. This means that if I have a post with ID 45, I can’t also have a sketch with ID 45, because a sketch is still a “general” post, even if it is of post type sketch.
So, on the Local Install, I then:
- Incremented all post IDs by… 17400, which put it above all current post IDs on the Live Site.
UPDATE wp_posts SET ID=ID+17400 ORDER BY ID DESC;
- Incremented all post parents by 17400. (Because images/attachments are also “posts” and I needed to ensure images remained attached to the correct posts.)
UPDATE wp_posts SET post_parent=post_parent+17400 ORDER BY post_parent DESC;
That’s enough to take care of the import conflict into wp_posts, but I also needed to preserve tag/category associations… which, of course, had ID conflicts of their own since there were already numerous tags and categories on the Live Site, too. And so:
- Incremented all term (tags/categories) IDs by 260.
UPDATE wp_terms SET term_id=term_id+260 ORDER BY term_id DESC;
- Incremented term relationships accordingly.
UPDATE wp_term_relationships SET term_taxonomy_id=term_taxonomy_id+260 ORDER BY term_taxonomy_id DESC;
UPDATE wp_term_relationships SET object_id=object_id+17400 ORDER BY object_id DESC;
- And term taxonomies.
UPDATE wp_term_taxonomy SET term_taxonomy_id=term_taxonomy_id+260 ORDER BY term_taxonomy_id DESC;
UPDATE wp_term_taxonomy SET term_id=term_id+260 ORDER BY term_id DESC;
So yay! Now I could export and import all the relevant tables as SQL files without conflicts. And eventually, all my images were uploaded via FTP. Cool! Done, right?
One Last Thing
Despite image attachments being in and defining their parent post in the
wp_posts table, and despite all of the posts “knowing” which images should be displayed, thumbnail images for posts weren’t showing up properly.
This was particularly maddening because in the edit view for sketch posts, I could see the images properly and could see that the correct images were attached to the sketch, but viewing the sketch post on the front end would not show any images at all. The gallery shortcode was rendering nothing. Also, none of the imported images show up in WordPress’s Media Library.
This is because I did not import anything from
wp_postmeta. Doing so didn’t help though…
meta_key “_wp_attached_file” define the filepath to the attachment with a given ID, and for some reason this is needed to pull the correct thumbnails (even though the filepath defined is for the full file… not the thumbnail). Terminology in this table is very confusing since it’s easy to forget that attachments are also “posts,” so the post ID refers to the attachment ID, not the ID of the post the attachment is attached to.
For some reason, incrementing all post_ids by 17400 here and importing them resulted in some weird shenanigans… incorrect thumbnails and associated images would show up on posts, but when you went into the edit view, the correct images were still attached. I ended up purging the imported
wp_postmeta stuff because it was too confusing trying to cross-reference IDs to figure out what was going on.
So the problem was now that I needed to regenerate the missing attachment metadata, but I couldn’t just import that data.
There are some options for this using WP functions like wp_generate_attachment_metadata(), but making db fixes in functions.php is messy. wp media regenerate via command line was apparently also an option, but I ended up using a plugin called Fix Media Library, lol. Despite being several years out of date, it worked like a charm. It took about 12 hours to altogether to process and regenerate metadata for tens of thousands of images, but it didn’t hang often, and when it did, it was easy enough to resume.
So yeah, that was my ordeal.
- Clean installs are cool for initial testing, but the staging environment should be as close to live as possible, dumbass
- This is so damn basic, but it would’ve saved me a lot of trouble, lol
- It was all extra dumb because I do have a local clone of the live site, but I didn’t want to mess that up either, even though that’s literally the point of it.
- And after the initial important to the clean install, I did a bunch of tag clean-up and categorisation work, and I didn’t want to have to re-import the Tumblr to the “real” staging site and redo all that. (I had ~2,500 Tumblr tags and parried it down to 1,000…)
- Importing images sucks
- Moving ~4 GB worth of images around is a pain no matter what.
- WordPress’s default importer needs some hella updates
- It’s sad because there was a v2 importer in development from seven years ago that was supposed to eventually replace the existing importer, but that never happened? And also the v2 importer still doesn’t work that great.
Anyway, I hope this is helpful to someone. :’)
I think Tumblr remains one of the least terrible centralised platforms in large part because they do make it possible to take your stuff with you when/if you want to leave. The Tumblr-to-WordPress importer works nicely, and from there, there exist various WordPress-to-OtherPlatform importers, if you wanna move further.
You can request an archive of your tweets, but it’s not in a format you can do much with. Everyone abandoning that ship right now is abandoning years of work, much of which was probably never posted elsewhere. That’s a shame, though to be honest, it’s not like Twitter ever made it easy to find old posts anyway.