Johan Broddfelt
/* Comments on code */
Donate $

MySQL to DB2

The last month I have been migrating a PHP-application from MySQL to DB2. I knew that there would be some issues with dates and that there might be some of the more specialized functions that could work a bit differently. But otherwise I expected SQL to be SQL for the most part. But boy was I wrong, the differences between SQL in databases is as big as the difference between Swedish and Danish. You can read them and understand most of it, but when you start to communicate things gets lost in translation.
In this post I'll go through most of the issues I ran into and also try to explain how I manage to make a workaround for it. Be aware that I have not done any deeper research on these things so there might be better ways to do some of these things. Let's begin.

Dates

So since I knew that dates would be an issue, let's start by examining them. In MySQL could use NULL as an indicator that the date has not been set. But when I started building on this application I was not that comfortable with NULL values and it seemed to me that most people using MySQL were using '0000-00-00' to represent that a date had not been set. DB2 does not allow this. And that is probably a good thing for data consistency. I later spent the good part of an hour trying to figure out why a table of 3000 records was not successfully inserted to a table, just to realize that one row had the date '2015-00-01'. Which is obviously not a valid date, but fully acceptable value for a date in MySQL.
It was also very convenient to write queries like WHERE `date` > 0 or WHERE `date` < 1 to find records where a date was set or not. These had to be changed to WHERE "DATE" IS NOT NULL and "DATE" IS NULL.
DATE_FORMAT(now(), '%Y-%m-%d %H:%i:%s') need to change to VARCHAR_FORMAT(CURRENT TIMESTAMP, 'YYYY-MM-DD HH24:MI:SS') and if you like you can read more about the conventions of DB2 date formatting at ibm. TO_DAYS({date}) should be changed to DAYS({date}) and if you want to make a DATEDIFF({date1}, {date2}) it easy to think that TIMESTAMPDIFF({UNIT_OF_TIME}, CHAR(TIMESTAMP({date1}) - TIMESTAMP({date2}))) but you would most certainly be mistaken since this function does not consider leap years and only assumes 30 days per month. So you are much better of using (DAYS({date1} - DAYS({date2})), at least if you are counting days.

Time

If you want to turn an integer of seconds from column sec_col into time, like "01:12:22" you do SEC_TO_TIME(sec_col) in MySQL if you want to do the same in DB2 you have to do some more REPLACE(CHAR(TIME('00:00:00') + sec_col seconds, ISO), '.', ':'). If you would like to turn a time back into an integer of seconds you do TIME_TO_SEC(time_col) in MySQL. In DB2 you use MIDNIGHT_SECONDS(time_col), but this function only return seconds since midnight. If you want to calculate time over midnight or over several days you also need to use the DAYS()-function and multiply the result by 86400 (24h * 60m * 60s) and add that to the MIDNIGHT_SECONDS result.

Format convention

In MySQL I like to write like SELECT `col1`, `col2` FROM `table1` WHERE `col3` = 1 but first of all DB2 does not use the `, instead you need to use " and DB2 changes all column names and tables names to uppercase if you do not use the ". So I accepted that I need to write all names in uppercase. Which means that a regular SQL-statement like this would look like select "COL1", "COL2" from "TABLE1" where "COL3" = 1. This means that I would have to rewrite all my queries. Luckily I can trust that DB2 will turn all my columns to uppercase, so I only need to make sure that the reserved words are in uppercase with quotes. The words I found are "TYPE", "DATE", "MONTH", "KEY", "RECORD", "PARAMETERS", "LANGUAGE", "GROUP", "INDEX", "FROM", "TO", "COMMENT", "PATH", "PUBLIC", "VALUE", "USER", "NUMBER". Most of them are quite obvious, but the list is a bit longer than it is in MySQL. For a complete list check out the IBM DB2 documentation.

Indexes

Another lower level feature that differs is the indexes, in MySQL you create indexes on a table level while in DB2 all indexes are created in one list and the names may not conflict, so if you have 2 tables, each with an index called name on the name column, you need to make sure that they have different names before adding them to your DB2 database. While doing the data migration, running the DB2 migration tool, I also ran into another issue with indexes. When the tool was converting multi column indexes from MySQL in order to build CREATE statements for the DB2 it split up the multi column index to one separate index for each column, so I had to manually rewrite all those indexes by hand later.

Number of rows

When in PHP you normally use num_rows() you are not advised to use the built in db2_num_rows(). Because it will not return the correct number of rows if you use the devault cursor DB2_FORWARD_ONLY. So if you do not want to use the DB2_SCROLLABLE curson you should instead create your own query that checks the number of rows in the dataset. But there is another option, you could use the db2_set_option and activete the option rowcount DB2_ROWCOUNT_PREFETCH_ON in order to get the db2_num_rows() to work as expected.

Escape characters

When doing an insert with a single quote (') in a string like 'Tony's Pizza' you need to add an additional single quote besides the one you want to escape like this 'Tony''s Pizza' instead if 'Tony's Pizza' as you do in MySQL. And if you want to use the same string in a WHERE clause you cannot rely on neither of the additional single quote nor the backslash but instead you need to add ESCAPE '{character of choice}' after the string. So it could look like 'Tony#'s Pizza' ESCAPE '#'

Where 1

In MySQL you often see WHERE 1, it is used in the default query in PhpMyAdmin and it is very convenient if you want to add more criteria's with code then you just need to concatenate " AND COL1 = X " or similar for each criteria you want to add. But DB2 does not accept 1 as a filter so you need to rewrite it to WHERE 1=1.

Distinct CLOB

You cannot use DISTINCT with CLOBs unfortunately so you need to cast them to varchar like this "SELECT R.`info_box`, R.`customer_text`..." to "SELECT CAST(R.`info_box` AS VARCHAR(32000)) AS INFO_BOX, CAST(R.`customer_text` AS VARCHAR(32000)) AS CUSTOMER_TEXT..."

Group by

In MySQL you can do a GROUP BY and just expect the database to select values from columns that is not in the GROUP BY-clause. But in DB2 you need to specify every column that should be grouped, even if you know that all the values are the same, because they come from the same table. SELECT id, name FROM tbl GROUP BY id must be written as SELECT ID, NAME FROM TBL GROUP BY ID, NAME

Group_Concat

GROUP_CONCAT takes a list of items and groups it into a comma separated list. When searching the net, most of the answers you get stat that you need some recursive sql or solve ti using cursor. And that is probably because the feature was first added to DB2 in version 9.7, june 2011. It is called LISTAGG.

If Then Else

MySQL has an if statement that looks like IF ({condition}, {then}, {else}) that you could use like IF (A=1, 1, 0) AS col. In DB2 there is an IF statement but I never really got it to work and it seems like it is a general consensus on the internet that on should use CASE WHEN {condition} THEN {then} ELSE {else} END instead like CASE WHEN (A=1) THEN 1 ELSE 0 END AS COL.

Concat

Like MSSQL, DB2:s CONCAT function can only take 2 parameters unlike MySQL that kan handle an infinate amount. So CONCAT(COL1, ' ', COL2) has to be rewritten as CONCAT(CONCAT(COL1, ' '), COL2) so that every CONCAT statement only has two parameters. But MySQL also has a function called GROUP_CONCAT that can take several lines and group them and show all values of the lines that where grouped as a comma separated list. I'm still looking for a good solution for that in DB2.

Sorting

WARNING! If you are used to be able to write SELECT (cost_a + cost_b) AS total FROM cost_tbl ORDER BY total you are in for a treat. You cannot sort by calculated values. In DB2 you must sort in a in an additional query like this SELECT total FROM (SELECT (cost_a + cost_b) AS total FROM cost_tbl) A ORDER BY total. So, you have to split it up to an inner query that fetches the data and an outer query that sorts it. Just renaming a column does not have this effect though. This still works SELECT cost_a AS cost FROM cost_tbl ORDER BY cost.

Another thing you should be aware about when switching from MySQL is that DB2 is case sensitive which means that your list (Apple, banana, Citrus) will be sorted (Apple, Citrus, banana) because you used a lowercase b in banana. You can solve this by using UPPER or LOWER on the text and rewrite your query like SELECT NAME FROM TBL_FRUIT ORDER BY UPPER(NAME). You might also want to create a separate upper case index to speed up the queries that needs Case-Insensitive data in DB2.

LIMIT

DB2 Express can handle LIMIT x by default, but when you upload your code to a production environment it may fail anyway. That is becuase there is a compability vector called MYS that need to be set. It is set by default in the Express version. If you do not want to do that you should use FETCH FIRST x ROWS ONLY in order to limit your dataset. But if you want to use the OFFSET feature like LIMIT 3, 10 you need to use ROWNUM instead like this WHERE ROWNUM BETWEEN 10 AND 12 in order to get the same result. ROWNUM is a built in column that is provided by DB2 on all datasets.

Conversion function in PHP

Since I run all my queries by the same functions to validate and clean if needed modify the queries it was easy to add an additional function that could convert 80% of my queries to a DB2 like format, so that I did not have to rewrite every query in the entire system. Here you can see the code that converts most of the regular MySQL queries to DB2.

static function db2FixQuery($qry) {
    // Make sure that all column and table names that are within `` are changed to uppercase
    // and the `are changed to '
    $qry = preg_replace_callback("/([,|s|(|=|<|>|.])`([^`]+)`/",
        function ($m) {
            return $m[1] . '"' . strtoupper($m[2]) . '"';
        },
    $qry);

    // Make sure all zero dates are changed into NULL
    $qry = str_replace('"0000-00-00"', 'NULL', $qry);
    $qry = str_replace('"0000-00-00 00:00:00"', 'NULL', $qry);
    $qry = str_replace("'0000-00-00'", 'NULL', $qry);
    $qry = str_replace("'0000-00-00 00:00:00'", 'NULL', $qry);

    // Make sure that DB2 handles escaped _ and % correctly
    $qry = preg_replace("/('[^']*_[^']*')/s", "$1 ESCAPE '' ", $qry);

    // Change FROM "tbl_name" to FROM "TBL_NAME"
    $qry = preg_replace_callback("/ FROMs([^s]*?)s/is",
        function ($m) {
            return " FROM " . strtoupper($m[1]) . " ";
        },
    $qry);

    // Fix if cases from if(a=1, 3, 5) AS b to if (a=1) then b=3 else b=5
    // TODO: this does not work well on nested if-cases, fix them manually
    $qry = preg_replace_callback("/IFs?(([^(]+)sASs([^s]+)/is",
        function ($m) {
            $p = explode('(', $m[1]);
            $s = explode(')', $p[1]);
            $elemArr = explode(',', $s[0]);
            return "CASE WHEN " . $elemArr[0] . " THEN " . $elemArr[1] . " ELSE " . $elemArr[2] . " END AS " . $m[2] . " ";
        },
    $qry);

    // If not done already make sure that all column names are uppercase
    $qry = preg_replace_callback("/SELECTs(.*?)sFROM/is",
        function ($m) {
            return "SELECT " . strtoupper($m[1]) . " FROM";
        },
    $qry);

    //preg_match("/(.+?)VALUES/", $qry, $m);
    // Make sure that colums ar formated "COL1", "COL2"... instead of `COL1`, `COL2`...
    // Change values from "Data", "More data" to 'Data', 'More data'
    $m = explode(' VALUES ', $qry);
    $m[0] = strtoupper($m[0]);
    if (count($m) > 1) {
        $cnt = 0;
        foreach ($m as $str) {
            if ($cnt > 0) {
                $m[$cnt] = str_replace('"', "'", $m[$cnt]); // TODO: Fix so that this function only changes the correct appostrophs...
                $m[$cnt] = str_replace('"', '"', $m[$cnt]);
            }
            $cnt++;
        }
        $qry = implode(' VALUES ', $m);
    }
    return $qry;
}

Slow connection

When the system was converted and running we realized that it was really slow. We are still working on this issue, but some of the solutions I found along the way are these:

  • There might be issues with a bug in IBM's TCP/IP stack. This should be managed by running the following commands.
    $ su - db2inst1
    $ db2set DB2TCPCONNMGRS=1
  • "ACTIVATE DB dbname" opens a "fake" permanent connection to the database to keep it in an active state. Subsequent connections will be made much faster. Note that you'll need to use "DEACTIVATE DB dbname" before performing any offline tasks (e.g. an offline backup).
  • You can also change the application to use persistent connections / connection pooling. This can be done by using db2_pconnect() instad of db2_connect(). But this should give more or less the same result as ACTIVATE DB.
  • Slow connection when connecting from a DB2 Windows client.
  • Improve DB2 Connect Performance
  • Summary

    I am not completely done with the migration project and there are some things about tablespaces, auto increment and system tables that I have not mentioned here, because I feel that they do not create any conflicts with the queries in my application. It is just a different way of doing things at the backend. I wanted to get this post out there and I figure that if there are any more issues before I'm done it will mostly just be more of the same. After reading this post you might think why I set out to make this change. The initial reason was that it was a demand from the customer, but I can see that it is not all bad, the strict nature of the DB2 database is forcing me to be more consistent with data and less flaws in the data is a good thing. Yes, MySQL is more excusing and it has some more cool functions, but DB2 takes control and hopefully it will make the application faster.

    - MySQL IBM DB2 Database Migration

    << WS with Basic Auth and REST

    Comment

    Name
    Mail (Not public)
    Send mail uppdates on new comments

    Comments

    0 post found