Preparation / decision making
Babelfish is a PostgreSQL extension from Amazon that allows it to understand queries written in SQL Server syntax.
Understanding the choice of using Babelfish
Using Babelfish may seem like an easy choice, but there are many factors to consider. In our opinion, the most important pros and cons of Babelfish are:
- Drastically reduces the amount of work that needs to be done to translate stored procedures, client queries, column types, etc.
- Babelfish, like PostgreSQL, is open-source. There is no vendor lock-in.
- Babelfish does not support 100% of SQL Server's features or syntax, so you must find workarounds.
- As far as we know, Amazon Aurora is the only managed database that offers the Babelfish feature, so you will have to use Aurora, or run Babelfish yourself. This may change in the future, though.
If your organization uses AWS and has historically used SQL Server as your main relational database, then it may make sense to investigate Babelfish. However, if your team has lots of PostgreSQL experience and you only have a few SQL Server databases, then it may be worth it to homogenize your data stack on PostgreSQL. In the end, the choice is dictated by your individual circumstances.
The rest of this guide assumes you have already decided to use Babelfish. If you are still undecided if you should use the technology, it may be helpful to check out our SQL Server to PostgreSQL Migration Guide.
Identify non-compatible features
A useful tool that can provide valuable insights into the scope of the required work is Babelfish Compass (short for “COMPatibility ASSessment”). It was specifically designed to analyze SQL Server DDL code and identify features that are not compatible with Babelfish. Compass also has the ability to automatically rewrite some T-SQL constructs unsupported by Babelfish, such as the MERGE statement. We have an AWS Babelfish Compass Guide that will show you how to get started with Compass.
It is important to note that while the team behind Babelfish continuously adds new features, it does not cover 100% of Postgres' functionality.
Most non-trivial applications make use of some SQL Server feature that isn't supported by Babelfish. However, an alternative may be available in PostgreSQL. Common examples of these features include:
- DDL Triggers
- XML-related functionality
- Various ALTER statements (e.g. ALTER DATABASE, ALTER SCHEMA, ALTER PROCEDURE, etc.)
In cases where such functionality is essential to your application, you could resort to workarounds including a combination of Babelfish and PostgeSQL. For instance to make partitioning work in Babelfish you could create your respective tables with partitions in PostgreSQL using pgAdmin, then change the owner of the tables to dbo, so that they can be accessed on both the Babelfish port and the PostgreSQL port.
Another example is the use of XML-related functionality in stored procedures. While Babelfish does not currently support XML operations, PostgreSQL offers similar functionalities (although SQL Server's integration is more extensive). In this case you could recreate the affected stored procedures directly in PosgreSQL using the system-specific functions.
Give your app a haircut
There are almost always schemas, tables, and sometimes even entire databases that are no longer in use. For example, a recent client of ours claimed they needed to migrate 25 databases, but upon further investigation, less than 20 were actually used in production.
We use a few ad-hoc SQL scripts to determine how often database objects are actually used - stored procedures, tables, etc. It is essential to do that homework ahead of time so you don't waste effort moving things that aren't needed.
It is especially helpful to identify stored procedures that are not in use anymore - translating and testing complex stored procedures is probably the most time consuming part of a schema translation process.
Here is one of the many SQL scripts we use to discover stored procedure usage, credit to Thomas Stringer on DBA Stack Exchange :
db_name(st.dbid) as database_name,
object_name(st.objectid) as name,
p.size_in_bytes / 1024 as size_in_kb,
from sys.dm_exec_cached_plans p
cross apply sys.dm_exec_sql_text(p.plan_handle) st
where p.objtype = 'proc'
and st.dbid = db_id('SomeDatabase')
order by p.usecounts desc;
Keep in mind that this script only tells you how many times an SP has been executed since the last time it was compiled. Info about how often that happens is in Microsoft's Execution Plan Caching and Reuse page, although that particular article is slightly dated.
As you go through your DDL file you may also find views, indexes or stored procedures that are obsolete because they refer to tables or columns that no longer exist.
Translating a schema from MSSQL to Babelfish isn't that hard. There are a few changes that need to be made, but for the most part, AWS' tool Compass can tell you what needs to be done. Check out our AWS Babelfish Compass Guide article to learn more about it.
However, translating a schema well is a high leverage process that requires experience, attention to detail, and knowledge of the system's context. The costs of a small mistake in your database can be very serious, so we do not recommend trusting the outputs of any automated schema translation tool without keeping a close eye on it.
The recent emergence of AI systems has also drastically lessened the effort required to translate SQL from one dialect to another, should you need to dip into PostgreSQL syntax to replicate some functionality. It is truly amazing what the latest LLMs can do. We make use of them all the time, and are always testing new LLMs to see which ones are the most accurate for our use case.
Side note: We never train LLMs on customer code, or enter customer DDL code into LLMs (such as ChatGPT) that will train on it.
At the end of the day, you will need to export your schema's DDL code via SSMS, analyze it via Compass, and translate any unsupported syntax. The primary deliverable will be a working SQL file that creates a schema which is compatible with the original one. The primary objects that need to be converted are:
- Stored procedures
- Custom Types
- Logins / permissions
Ideally, someone who is experienced working with both SQL Server and PostgreSQL will build the new SQL file, potentially with the help of automated systems. Each RDBMS has it's own unique way of doing things, and even if you make something that "works", it may not be optimal.
Our process for translating these objects is as follows:
- Spin up a dev environment (usually a single VM with 16 GB RAM in the customer's VPC).
- Put a backup file of MSSQL on the dev instance, as well as a DDL file exported by SSMS
- Deploy MSSQL and Babelfish via Docker Compose and install Compass
- Create a SQL file that you will work on to create a compatible schema for Babelfish
Now it's time to work through the MSSQL DDL file, translating objects in the following order:
- (Optional) Custom types and sequences
- Tables / columns (without constraints)
- Stored procedures
The translation process isn't complex, you just have to do the work. It is a long and detail-oriented process, so be sure to constantly re-compile the Babelfish database with your in-progress DDL file to ensure that you haven't made any mistakes.
Both SQL Server and PostgreSQL offer a comprehensive set of tools and options for permission management. However, it's important to note some differences in their terminology and functionality.
In SQL Server, logins are used for connecting to a database instance and are not directly granted permissions. Instead, permissions must be granted to a specific database through a user mapped to that login. Users can also be added to roles to inherit permissions.
On the contrary, PostgreSQL does not employ logins. It uses the term 'Role' to handle both authentication and authorization. Roles, which can either login (similar to SQL Server Login) or not (similar to SQL Role), can also inherit permissions from other roles. PostgreSQL does not have a distinct user entity; a 'user' is typically a 'role with login capability'.
Configuring Permissions in Babelfish
Before establishing permissions in Babelfish, a thorough understanding of your target SQL Server database's permission scheme is required. This stage is a good time to audit permissions and check if they align with your business needs.
As a relatively new technology, Babelfish drastically minimizes the workload for a successful migration from SQL Server to PostgreSQL. However, please bear in mind that it is yet to support the entire breadth of PostgreSQL’s functionalities, including certain permissions.
At the time of writing this article, Babelfish supports the GRANT and REVOKE of SELECT, INSERT, UPDATE, DELETE, REFERENCES on tables and views, and EXECUTE on stored procedures. However, it does not support granting permissions on a database or schema level or command permissions such as GRANT CREATE TABLE.
Permissions are created in the SQL dialect the object is created in for optimal use of Babelfish as your T-SQL schema can be used directly without translation. However, there can be unexpected issues if objects and permissions are created in different dialects. Also, when upgrading to a newer version of Babelfish, previous GRANT statements may need to be redone if they were granted in both T-SQL and PostgreSQL.
Currently, Babelfish does not support changing database ownership. As a workaround, logins can be adjusted to own a specific database through a series of steps using sysadmin and a new login.
Keep in mind, while this workaround may be effective for lower environment testing, you may need more detailed permissions for production. We recommend running a script to grant table-level permissions to users after creating a database owner, but refresh your permission script each time a new table is generated in your database.
Sample Docker Compose file
Here is a Docker Compose file that will spin up a Babelfish and MSSQL instance, as well as transfer SQL files from your local machine to the Docker images for initial DB setup. We recommend running the DDL file from SSMS on SQL Server startup (this would correspond to MSSQL-SETUP-FILE.sql in the example below).
This way, if either of the databases get into a weird state because of testing you're doing on them, you can just blow away the entire database and start from scratch.
command: ["-m", "multi-db"]
Our free data migration tool can move all of your data from SQL Server to Babelfish. Please check out our database migration products page for more information.
Why not use native tools?
It is feasible to use bcp and psql to export data from MSSQL and import data into Babelfish.
In case you're unaware, bcp is a program from Microsoft that allows you to export and import CSVs from MSSQL, and psql is a command line client which allows you to do the same for PostgreSQL.
However, we recommend our own tools over these native solutions for three reasons:
- We have not found bcp to be a reliable option for inserting data into Babelfish. Here is an outstanding GitHub issue we submitted on the topic. Maybe it will be resolved!
- The relative lack of compatibility between SQL Server and PostgreSQL's CSV export / import tools.
- Albatross in particular has a few nice features that makes the whole process easier. In particular, the replication feature of our paid tool is extremely helpful.
Ideally, you would be able to export a csv from MSSQL and import it into Babelfish using bcp. This works great during step 1 - bcp is excellent (and fast) at exporting data from MSSQL. However, since bcp is unreliable when inserting a CSV into Babelfish, you must use psql.
The main issue is that the CSVs exported by bcp are not compatible with psql. bcp doesn't support creating RFC 4180 compliant CSVs because it doesn't support optionally enclosing fields (such as with double quotes) if it has a delimiter in it. This is a major bummer, because PSQL does not support multicharacter delimiters - it only supports optionally enclosing them.
This incompatibility is a crucial disconnect that can only be overcome by creating a custom translation script. It actually isn't that hard to write a script to translate a CSV from one format to another - in fact, that is basically what our migration tools do under the hood. However, you're here to migrate a database, not get into the weeds of CSV parsing details.
Once your SQL Server schema is transferred and your initial data load is done, it is essential to verify that the data is the two systems matches (assuming you have pulled data from a snapshot, the two databases should be identical).
It's important to note that this is just a sanity check and not a valid substitution for extensive and diligent application testing.
The tool we recommend for this task is DataGrip, an IDE for databases by JetBrains. The Compare Content action allows you to compare data between two databases at a table, view or materialized view level, which means that you can only compare one table at a time.
All you need to do is connect to your SQL Server source and the newly created Babelfish database, select the two tables you want to compare, right click and choose Tools -> Compare Content.
This will give you a comparison view of the two databases side by side. Color coding shows entire rows that differ as well as individual cells that differ in a column.
Keep in mind that some of the data may differ due to differences in the systems' functionality. For instance, PostgreSQL stores boolean values as true/false, while SQL Server uses 1/0.
In PostgreSQL the maximum precision of the timestamp type is 6 digits, while in SQL Server the limit for the timestamp-related datetime2 type is 7 digits. In such cases even though the data is essentially the same it is still flagged as different.
A workaround is to utilize the Tolerance field which allows you to specify how many columns might differ for two rows to be considered equal. For instance, if you have a table with 5 columns, one of which is of type bool, you can set the tolerance level to 1 and ignore the differences in the boolean values.
When dealing with timestamp-related precision differences, you could also truncate the datetime2 values in your source SQL Server database to precision of 6 digits.
The final changeover is when you finally point all of your applications (or point load at new, modified applications) at your new database. This is an extremely high leverage moment that requires the utmost preparation and care.
You should have rehearsed this changeover process at least once in a staging or dev environment and must make a contingency plan to roll back to your original environment in case some unacceptable errors appear once the new tech stack goes live.
It is impossible to give specific advice for this stage without detailed knowledge of your given environment. Every organization's applications, deployment methods, and downtime tolerance is different. We offer consulting services that can help you prepare for this part of the process.
- Postgres, and by extension Babelfish, has a limit of 100 parameters per stored procedure. SQL Server, on the other hand, allows for up to 2100 parameters. If you make use of stored procedures with over 100 parameters, you would have to evaluate which ones are actually in use and rewrite the stored procedure accordingly.
- A drawback of Babelfish you need to be aware of is that stored procedures are not checked until they are called by your application. This means that a faulty stored procedure may compile during the initial schema creation, but could potentially error out later on. This can be avoided by compiling the translated DDL file back on SQL Server, which would error out if it encounters a faulty stored procedure.
- If you are using psql to move your data, be aware that it has a limit of 63 characters on column names. Be sure to check if any of your columns exceed that limit and truncate them accordingly.
- Having old data in your database can violate constraints such as foreign keys created WITH NOCHECK in SQL Server, preventing them from compiling. Though Babelfish ignores the NOCHECK clause, in SQL Server a FK constraint can be created for any new data entries without checking if the already existing data complies to the said constraint. Your options are to remove the incompatible records or to modify/omit the constraint itself.
- SQL Server is deprecating the text, ntext and image data types. Check if your schema makes use of those and convert them to varchar(max), nvarchar(max) and varbinary(max) respectively.
- Using the SQL Server timestamp type can cause confusion during the post migration verification process. In SQL Server timestamp is a binary, unique identifier value, rather than an action datetime value. You need to keep in mind that during the data transfer the original timestamp values are ignored and replaced by values generated by Babelfish.
We hope this article helped you understand how to migrate from SQL Server to Babelfish at a high level. If you would like help with such a project, please send us a message on our contact page, we would be happy to assist!