Table of Contents
This guide explains how to migrate from SQL Server to Babelfish.
Babelfish is a PostgreSQL extension from Amazon that allows it to understand queries written in SQL Server syntax.
Using Babelfish may seem like an easy choice, but there are many factors to consider. In our opinion, the most important pros and cons of Babelfish are:
- Pro: Drastically reduces the amount of work that needs to be done to translate stored procedures, client queries, column types, etc.
- Con: Babelfish does not support 100% of SQL Server's features or syntax, so you must find workarounds.
- Pro: Babelfish, like PostgreSQL, is open-source. There is no vendor lock-in.
- Con: As far as we know, Amazon Aurora is the only managed database that offers the Babelfish feature, so you will have to use Aurora, or run Babelfish yourself.
If your organization is on AWS and has historically used SQL Server as your main relational database, then it may make sense to investigate Babelfish. However, if your team has lots of PostgreSQL experience and you only have a few SQL Server databases, then it may be worth it to homogenize your data stack on PostgreSQL. In the end, the choice is really up to you.
This guide assumes you have decided to use Babelfish, instead of going directly to PostgreSQL. If you are still in the initial research phase of your project, you can also check out our SQL Server to PostgreSQL Migration Guide.
High level steps
The high level steps are:
- Preparation / decision making
- Identifying non-compatible features
- Schema translation
- Data movement
- Data comparison
- Final changeover
Preparation / decision making
The first part of the process is high level research. You should analyze your application and identify parts that don't need to be migrated.
You would also need to look into what SQL Server-specific features your application rely on and whether alternatives exist in both Babelfish and PostgreSQL.
The point of this exercise is to make a yes/no decision on whether this project is worth your time, energy, and money.
Identify non-compatible features
A useful tool that can provide valuable insights into the scope of the required work is Babelfish Compass (short for “COMPatibility ASSessment”). It was specifically designed to analyze SQL Server DDL code and identify features that are not compatible with Babelfish. Compass also has the ability to automatically rewrite some T-SQL constructs unsupported by Babelfish, such as the MERGE statement. We have an AWS Babelfish Compass Guide that will show you how to get started with Compass.
It is important to note that while the team behind Babelfish continuously adds new features, it does not cover 100% of Postgres' functionality.
Most non-trivial applications make use of some SQL Server feature that isn't supported by Babelfish. However, an alternative may be available in PostgreSQL. Common examples of these features include:
- DDL Triggers
- XML-related functionality
- Various ALTER statements (e.g. ALTER DATABASE, ALTER SCHEMA, ALTER PROCEDURE, etc.)
In cases where such functionality is essential to your application, you could resort to workarounds including a combination of Babelfish and PostgeSQL. For instance to make partitioning work in Babelfish you could create your respective tables with partitions in PostgreSQL using pgAdmin, then change the owner of the tables to dbo, so that they can be accessed on both the Babelfish port and the PostgreSQL port.
Another example is the use of XML-related functionality in stored procedures. While Babelfish does not currently support XML operations, PostgreSQL offers similar functionalities (although SQL Server's integration is more extensive). In this case you could recreate the affected stored procedures directly in PosgreSQL using the system-specific functions.
Give your app a haircut
There are almost always schemas, tables, and sometimes even entire databases that are no longer in use. For example, a recent client of ours claimed they needed to migrate 25 databases, but upon further investigation, less than 20 were actually used in production.
We use a few ad-hoc SQL scripts to determine how often database objects are actually used - stored procedures, tables, etc. It is essential to do that homework ahead of time so you don't waste effort moving things that aren't needed.
It is especially helpful to identify stored procedures that are not in use anymore - translating and testing complex stored procedures is probably the most time consuming part of a schema translation process.
Here is one of the many SQL scripts we use to discover stored procedure usage, credit to Thomas Stringer on DBA Stack Exchange :
db_name(st.dbid) as database_name,
object_name(st.objectid) as name,
p.size_in_bytes / 1024 as size_in_kb,
from sys.dm_exec_cached_plans p
cross apply sys.dm_exec_sql_text(p.plan_handle) st
where p.objtype = 'proc'
and st.dbid = db_id('SomeDatabase')
order by p.usecounts desc;
Keep in mind that this script only tells you how many times an SP has been executed since the last time it was compiled. Info about how often that happens is in Microsoft's Execution Plan Caching and Reuse page, although that particular article is slightly dated.
As you go through your DDL file you may also find views, indexes or stored procedures that are obsolete because they refer to tables or columns that no longer exist.
Translating a schema from MSSQL to Babelfish isn't that hard. In fact, there are automated services (such as Amazon DMS) that can do it for you.
The advent of advanced AI systems has also drastically lessened the effort required to translate SQL from one syntax to another. It is truly amazing what the latest LLMs can do. We make use of them all the time, and are always testing new LLMs to see which ones are the most accurate for our use case.
Side note: We never train LLMs on customer DDL code, or enter customer DDL code into LLMs (such as non-enterprise ChatGPT) that will train on it.
However, translating a schema well is still a hard, labor intensive process, and the costs of making a mistake can be catastrophic to an IT system. In all cases we've seen, there are hard decisions to be made and trade offs to be considered.
At the end of the day, you need to export your schema's DDL code via SSMS, analyze it via Compass, and translate any unsupported syntax. We make use of AI systems (when clients allow us) to speed up the process, but the primary deliverable is a working SQL file that creates a schema which is compatible with the original DB. The primary objects that need to be converted are:
- Stored procedures
Ideally, someone with professional experience working with both SQL Server and PostgreSQL will build the new SQL file, potentially with the help of an LLM. Each RDBMS has it's own unique way of doing things, and even if you make something that "works", it may not be optimal.
Our process for translating these objects is simple:
- Spin up a dev environment (usually a single VM with 16 GB RAM in the customer's VPC).
- Put a backup file of MSSQL on the dev instance, as well as a DDL file exported by SSMS
- Deploy MSSQL and Babelfish via Docker Compose and install Compass
- Create a SQL file that you will work on to create a compatible schema for Babelfish
Now it's time to work through the MSSQL DDL file, translating objects in the following order:
- (Optional) types / sequences
- Tables / columns (without constraints)
- Stored procedures
The translation process isn't complex, you just have to do the work. It is a long and detail-oriented process, so be sure to constantly re-compile the Babelfish database with your in-progress DDL file to ensure that you haven't made any mistakes.
Both SQL Server and PostgreSQL offer a comprehensive set of tools and options for permission management, with the principle of least privilege being equally critical in both systems. However, it's important to note some differences in their terminology and functionality.
In SQL Server, logins are used for connecting to a database instance and are not directly granted permissions. Instead, permissions must be granted to a specific database through a user mapped to that login. Users can also be added to roles to inherit permissions.
On the contrary, PostgreSQL does not employ logins. It uses the term 'Role' to handle both authentication and authorization. Roles, which can either login (similar to SQL Server Login) or not (similar to SQL Role), can also inherit permissions from other roles. PostgreSQL does not have a distinct user entity; a 'user' is typically a 'role with login capability'.
Configuring Permissions in Babelfish
Before establishing permissions in Babelfish, a thorough understanding of your target SQL Server database's permission scheme is required. This stage is a good time to audit permissions and check if they align with your business needs. Keep an eye out for the server-level roles your organization uses, any custom roles, their applications, and any unnecessary permissions or outdated logins.
As a relatively new technology, Babelfish drastically minimizes the workload for a successful migration from SQL Server to PostgreSQL. However, please bear in mind that it is yet to support the entire breadth of PostgreSQL’s functionalities, including certain permissions.
At the time of writing this article, Babelfish supports the GRANT and REVOKE of SELECT, INSERT, UPDATE, DELETE, REFERENCES on tables and views, and EXECUTE on stored procedures. However, it does not support granting permissions on a database or schema level or command permissions such as GRANT CREATE TABLE.
Permissions are created in the SQL dialect the object is created in for optimal use of Babelfish as your T-SQL schema can be used directly without translation. However, there can be unexpected issues if objects and permissions are created in different dialects. Also, when upgrading to a newer version of Babelfish, previous GRANT statements may need to be redone if they were granted in both T-SQL and PostgreSQL.
Currently, Babelfish does not support changing database ownership. As a workaround, logins can be adjusted to own a specific database through a series of steps using sysadmin and a new login.
Keep in mind, while this workaround may be effective for lower environment testing, you may need more detailed permissions for production. We recommend running a script to grant table-level permissions to users after creating a database owner, but refresh your permission script each time a new table is generated in your database.
Sample Docker Compose file
Here is a Docker Compose file that will spin up a Babelfish and MSSQL instance, as well as transfer SQL files from your local machine to the Docker images for initial DB setup. We recommend running the DDL file from SSMS on SQL Server startup (this would correspond to MSSQL-SETUP-FILE.sql in the example below).
This way, if either of the databases get into a weird state because of testing you're doing on them, you can just blow away the entire database and start from scratch.
command: ["-m", "multi-db"]
Please allow us to be a little sales-y for a bit.
We truly believe that our product, Albatross, is the best solution on the market for migrating data from SQL Server to Babelfish for PostgreSQL.
We also offer a free, open-source tool called SQLpipe that can transfer data between SQL Server and Babelfish, albeit only one table at a time.
Why not use native tools?
It is feasible to use bcp and psql to export data from MSSQL and import data into Babelfish. In case you're unaware, bcp is a program from Microsoft that allows you to export and import CSVs from MSSQL, and psql is a command line client which allows you to do the same for PostgreSQL.
However, we recommend our own tools over these native solutions for three reasons:
- We have not found bcp to be a reliable option for inserting data into Babelfish. Here is an outstanding GitHub issue we submitted on the topic. Maybe it will be resolved!
- The relative lack of compatibility between SQL Server and PostgreSQL's CSV export / import tools.
- Albatross in particular has a few nice features that makes the whole process easier.
Ideally, you would be able to export a csv from MSSQL and import it into Babelfish using bcp. This works great during step 1 - bcp is excellent (and fast) at exporting data from MSSQL. However, since bcp is unreliable when inserting a CSV into Babelfish, we must use psql.
The main issue is that the CSVs exported by bcp are not compatible with psql. bcp doesn't support creating RFC 4180 compliant CSVs because it doesn't support optionally enclosing fields (such as with double quotes) if it has a delimiter in it. This is a major bummer, because PSQL does not support multicharacter delimiters - it only supports optionally enclosing them.
This incompatibility is a crucial disconnect that can only be overcome by creating a custom translation script. It actually isn't that hard to write a script to translate a CSV from one format to another - in fact, that is basically what SQLpipe and Albatross do under the hood. However, you're here to migrate a database, not get into the weeds of CSV parsing details.
SQLpipe is a tool that allows you to move the result of an arbitrary query from one database to another. For example, you could run a discovery query to find all schemas and tables in a given database, then transfer each one by running `select * from <schema_name>.<table_name>` for every object you find.
This is a perfectly valid way to move a copy of your data! Just be sure to pull the data from a snapshot (or inactive DB) so you don't transfer data that violates foreign key constraints. Also, you should manually create the objects in your new, target database without using SQLpipe's DDL generation feature. This is because SQLpipe does not transfer primary key constraints, foreign key constraints, indexes, etc, when moving the result of a query.
Also, be sure to add indexes and constraints after the data has been moved - this will lessen the chance of transfer errors and drastically improve performance.
Albatross is a specialized version of SQLpipe that excels at database transfers. It has three key features that improve the speed and safety of a database migration.
First, it automatically discovers objects in your source database and validates that those objects are present (and compatible) in your target database.
Second, it concurrently transfers multiple tables at a time, starting with the largest, as measured by size in bytes. It gives you the option to perform this extraction from a separate DB (such as a snapshot or backup) to ensure the you're pulling from a static data source. This also has the side benefit of reducing load on production systems.
Third, once the initial load is complete, Albatross automatically replicates data changes from your source DB to your new target DB. This allows you to perform a changeover with minimal downtime.
Once your SQL Server schema is transferred and your initial data load is done, it is essential to verify that the data is the two systems matches (assuming you have pulled data from a snapshot, the two databases should be identical).
It's important to note that this is just a sanity check and not a valid substitution for extensive and diligent application testing.
The tool we recommend for this task is DataGrip, an IDE for databases by JetBrains. The Compare Content action allows you to compare data between two databases at a table, view or materialized view level, which means that you can only compare one table at a time.
All you need to do is connect to your SQL Server source and the newly created Babelfish database, select the two tables you want to compare, right click and choose Tools -> Compare Content.
This will give you a comparison view of the two databases side by side. Color coding shows entire rows that differ as well as individual cells that differ in a column.
Keep in mind that some of the data may differ due to differences in the systems' functionality. For instance, PostgreSQL stores boolean values as true/false, while SQL Server uses 1/0.
In PostgreSQL the maximum precision of the timestamp type is 6 digits, while in SQL Server the limit for the timestamp-related datetime2 type is 7 digits. In such cases even though the data is essentially the same it is still flagged as different.
A workaround is to utilize the Tolerance field which allows you to specify how many columns might differ for two rows to be considered equal. For instance, if you have a table with 5 columns, one of which is of type bool, you can set the tolerance level to 1 and ignore the differences in the boolean values.
When dealing with timestamp-related precision differences, you could also truncate the datetime2 values in your source SQL Server database to precision of 6 digits.
The final changeover is when you finally point all of your applications (or point load at new, modified applications) at your new database. This is an extremely high leverage moment that requires the utmost preparation and care.
You should have rehearsed this changeover process at least once in a staging or dev environment and must make a contingency plan to roll back to your original environment in case some unacceptable errors appear once the new tech stack goes live.
It is impossible to give specific advice for this stage without detailed knowledge of your given environment. Every organization's applications, deployment methods, and downtime tolerance is different. We offer consulting services that can help you prepare for this part of the process.
- Postgres, and by extension Babelfish, has a limit of 100 parameters per stored procedure. SQL Server, on the other hand, allows for up to 2100 parameters. If you make use of stored procedures with over 100 parameters, you would have to evaluate which ones are actually in use and rewrite the stored procedure accordingly.
- A drawback of Babelfish you need to be aware of is that stored procedures are not checked until they are called by your application. This means that a faulty stored procedure may compile during the initial schema creation, but could potentially error out later on. This can be avoided by compiling the translated DDL file back on SQL Server, which would error out if it encounters a faulty stored procedure.
- If you are using psql to move your data, be aware that it has a limit of 63 characters on column names. Be sure to check if any of your columns exceed that limit and truncate them accordingly.
- Having old data in your database can violate constraints such as foreign keys created WITH NOCHECK in SQL Server, preventing them from compiling. Though Babelfish ignores the NOCHECK clause, in SQL Server a FK constraint can be created for any new data entries without checking if the already existing data complies to the said constraint. Your options are to remove the incompatible records or to modify/omit the constraint itself.
- SQL Server is deprecating the text, ntext and image data types. Check if your schema makes use of those and convert them to varchar(max), nvarchar(max) and varbinary(max) respectively.
- Using the SQL Server timestamp type can cause confusion during the post migration verification process. In SQL Server timestamp is a binary, unique identifier value, rather than an action datetime value. You need to keep in mind that during the data transfer the original timestamp values are ignored and replaced by values generated by Babelfish.
We hope this article helped you understand how to migrate from SQL Server to Babelfish at a high level. If you would like help with such a project, please send us a message on our contact page, we would be happy to assist!