2012-06-27

Moving posts from a proprietary CMS to Blogger

My wife has been writing a blog about our family since 2008, but it's been over a year since she's made a new post. Since it was not being used much, I decided to save money on the server hosting the blog and move it over to Blogger. The old blog was running under our company's closed source CMS (PHP, MySQL) and hosted both posts and images (over 3600 of them).

I decided to upload all of them to Picasa and move the blog over to Blogger in a spare hour or two. The only reason I went with these is their simplicity and ease of use (from the user's perspective at least). It took somewhat more time than that and I've hit a few snags in the process, so I thought it was time for a new post while I was still hurting from the experience. ;)


So in the unlikely event that someone is going down this same road, here are a few pointers to get through this faster than I did.

Interestingly enough, the part where I moved 3600 images over to Picasa was the most seamless of them all. Of course I upgraded my storage first because the images were all over 800 px wide and they would not fit into the 1GB quota. It was blazing fast too.


Exporting to Picasa



In PHP (using Zend 1.10 & its Gdata API, install with apt-get install zend-framework under Ubuntu if you don't already have it installed), it went something like this (for each image):
Make sure to fill in the CAPS_VARIABLES with your own data:



?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
$serviceName = Zend_Gdata_Photos::AUTH_SERVICE_NAME;
$client = Zend_Gdata_ClientLogin::getHttpClient("PICASA_USER", "PICASA_PASSWORD", $serviceName);
$gp = new Zend_Gdata_Photos($client, "KonradKiss-FamilyPhotoExporterToPicasa-0.1");
$filename = "PATH_TO_PHYSICAL_JPEG_FILE";
$fd = $gp->newMediaFileSource($filename);
$fd->setContentType("image/jpeg");
$photoEntry = $gp->newPhotoEntry();
$photoEntry->setMediaSource($fd);
$photoEntry->setTitle($gp->newTitle("UPLOADED_FILE_NAME"));
$photoEntry->setSummary($gp->newSummary("PHOTO_CAPTION"));
$date = "Y-m-d H:i:s FORMAT DATE";
$timestamp = new Zend_Gdata_Photos_Extension_Timestamp(strtotime($date)*1000);
$photoEntry->setGphotoTimestamp($timestamp);
$photoTags = "family, ".substr($date, 0, 4); // adding tags: family, year
$keywords = new Zend_Gdata_Media_Extension_MediaKeywords();
$keywords->setText($photoTags);
$photoEntry->mediaGroup = new Zend_Gdata_Media_Extension_MediaGroup();
$photoEntry->mediaGroup->keywords = $keywords;
$albumQuery = $gp->newAlbumQuery();
$albumQuery->setUser("default"); //
$albumQuery->setAlbumName("ALBUM_NAME");
$insertedEntry = $gp->insertPhotoEntry($photoEntry, $albumQuery->getQueryUrl());
$photoid = $insertedEntry->getGphotoId()->getText();


Pretty straightforward. Now, $photoid has the Picasa id of the inserted photo.

You can also simply retrieve the album's id (not name) from the returned $insertedEntry (this time using direct member variables instead of the getter methods):

?
1
$albumid = $insertedEntry->gphotoAlbumId->text;


To be able to use these images in your posts, you will want to save these ids back into your legacy data table - so that they are associated with the original image data. When you're done, for each local image record you can refer to a Picasa album and photo by their ids. This is important, because this is how you are able to fetch more information about a specific image from Picasa (ie. to get thumbnail urls) later on when you're assembling posts and their content.

Before I forget, make sure your Picasa album is not set to private.

Your next step could be exporting data from the proprietary solution to a format that Blogger will actually import. At a glance, you have three ways to go:

  • You can do it manually if you only have a few posts
  • You might think of using the Blogger Web Data API
  • You could prepare an atom xml that Blogger can import


Don't use the Blogger Web Data API for importing many posts


You would write a new post to Blogger this way:


?
1
2
3
4
5
6
7
8
9
10
11
12
13
$clientBlogger = Zend_Gdata_ClientLogin::getHttpClient("BLOGGER_USER", "BLOGGER_PASS", 'blogger', null,
  Zend_Gdata_ClientLogin::DEFAULT_SOURCE, null, null,
  Zend_Gdata_ClientLogin::CLIENTLOGIN_URI, 'GOOGLE');
$gdClient = new Zend_Gdata($clientBlogger);
$entry = $gdClient->newEntry();
$entry->title = $gdClient->newTitle("POST_TITLE");
$entry->content = $gdClient->newContent("POST_BODY");
$entry->published = $gdClient->newPublished("POST_DATE_(Y-m-d\TH:i:s\Z)");
$entry->content->setType('text');
$createdPost = $gdClient->insertEntry($entry, $uri);
$idText = split('-', $createdPost->id->text);
$newPostID = $idText[2];


Should be straightforward. Keep the content type 'text' - you can still pass unencoded HTML for the body of the post.

Using this method is fine as long as you have < 10 posts. Above that - no telling after how many posts - Google will unexpectedly flag you as a possible spammer and require you to validate your humanity through hard-to-see images of texts and numbers. It takes significantly longer to take care of that problem within your program, so you better stay away from it. If you have many posts your pretty much only bet is to go with preparing an XML file that would make Blogger import your posts.


Accessing Picasa images


It's relatively easy to access your Picasa image thumbnails and captions once you have the photo and album ids.


?
1
2
3
4
5
6
7
8
9
$client = Zend_Gdata_ClientLogin::getHttpClient("PICASA_USER", "PICASA_PASSWORD", $serviceName);
$gp = new Zend_Gdata_Photos($client, "KonradKiss-FamilyPhotoExporterToPicasa-0.1");
$query = $gp->newPhotoQuery();
$query->setUser("PICASA_USER");
$query->setPhotoId("PHOTO_ID");
$query->setAlbumId("ALBUM_ID");
$photoEntry = $gp->getPhotoEntry($query);


Now, - although not advertised - the cool part is that Google can create any size thumbnail you want. It generates a few by default (72, 144, 288, etc.. widths) but you can as well fetch the size you want. First you could fetch the url for the thumb that is 144 pixels wide:

?
1
$thumb_url = $photoEntry->mediaGroup->thumbnail[1]->url;


It will return something like this:
http://lh3.ggpht.com/-H27uiqL4Muk/T-jPosmIBqI/AAAAAAAAIGI/hWUsnZxud7o/s144/0010%252520%2525282008-07-15%252529.jpg

The /s144/ part is what you should replace to whatever you want (/800/ is probably a good size) to change the width of the thumbnail you want to fetch.


?
1
$large_image_url = str_replace("/s144/", "/s800/", $photoEntry->mediaGroup->thumbnail[1]->url);


So instead of going with the original source, I chose to go with a thumbnail that is always the given size and for which I can always increase the size relatively easily in case more people will be using screen resolutions over a 1024px width in my family.


Steps to import into Blogger


There are some contradicting posts, old version docs about the whole process all over the net. I'm not going to post code about this, because it would only be exporting your own data anyway, but hopefully the following will save you some time.

What you want to do first is create a single test post, and then export your Blogger blog (see Settings > Other) and hang onto that file. Make a copy and work on the copy. In this file, every blog setting, every information is included as an entry node. Keep everything above the generator node, and remove all entries that is not a post below that. This is so the file is less bloated and your blog settings are not overwritten. You can tell an entry node is for a post from it's category child node: it's term parameter will end with '#post'.

You can use this entire entry node as a template for your data. Once you have created your blog entries, you can paste that in place of your test post entry and save the file.

Blogger is pretty quiet about any parsing errors that might happen. If you see a message "Writing blog posts" stay on for eternity, then you probably made a mistake and your xml has a syntax error. Check it for validity in an online XML format validator.

Another important thing is that each entry has an id child node. One employee on the API forums suggests that you can just leave this out and let Blogger generate this ID. This does not work. You need the id node defined, and you need it to have a value. I suggest you format the value like this:

tag:blogger.com,1999:blog-<BLOG_ID>.post-<POST_ID>

You can find your BLOG_ID in the URI when you are on the administration page of your blog. It's a long number. POST_ID can be anything you want - using your proprietary post id is probably a good idea.

Hope this saves you time.

No comments:

Post a Comment