Friday, March 11, 2011

Transfer Sybase SQL Anywhere IMAGE Binary data to PostgreSQL BYTEA, Part1 (using CSV)

As an update to my previous post, easy way to import images to PG, where I wanted to migrate a sybase image column to a postgresql bytea column. The method I explained there requires the image file be present in order to import it to the postgres database. But in cases where the images or documents are only available in an IMAGE column of a sybase database, I have used two methods to transfer them to a PG database.

1. From a CSV created from the sybase table
2. Using Pentaho Kettle Data Integration.

I am explaining the first method in this post, Importing the image data using CSV.

Step 1 (Export the sybase table to CSV)


First of all, we need to export the required data of the sybase table to a csv. Note that my table in sybase looks like below,

images
======
image_ID integer,
image_name varchar(50),
image_data image

Before exporting the data in the table, the images table, the image_data column should be converted to ASCII string, encoded with base64. This can be done with the inbuilt function BASE64_ENCODE() which will return a base64 string. Check the following query to get the required data set that can later be exported to csv,

SELECT image_ID, image_name, BASE64_ENCODE(image_data) FROM images.

Note that in the query, the image_data field is encoded with 64 with the above function.

Step 2 (Import csv to PostgreSQL)

Once the csv is ready, import it to the postgresql database. Note that the image_data field should be imported to a string type field of the postgresql database table. Check the table I have in PG below,

images
======
image_ID int,
image_name character varying(50),
image_text text
image_data bytea

The copy command that imports the csv into the PG table may look like below,

COPY images FROM 'C:\\PG\\images.csv' WITH QUOTE '''' CSV

Note that if the text data in your csv is quoted, then the relevant quote should be provided in the COPY command to prevent the same quote be copied into the table alone with the data. For example, if you didn't provide the QUOTE in the COPY command, you will have data like, 'company_logo', in the table. This should be avoided, specially for the base64 string.

Step 3 (Update bytea column with base64 string in the text column)

Once we have data in the images table, images saved in the text column encoded with base64, we need to update the bytea column of the same table by decoding data in the text column into the bytea format. Check the following script I have used in this example,

UPDATE images SET image_data = DECODE(image_text, 'base64');

This script decodes data in the image_text column and updates the image_data, which is a bytea column, with the correct binary data of the particular image. You can clear the image_text column later to save addition space.

It's easy as that and check out the next post, which is the PART 2 of this post, to see how we can directly transfer image data from sybase to postgresql using Pentaho Data Integration ETL package.

No comments: