Has anybody already tried and succeeded in migrating data between two operational databases with DBT or any other ELT tool ?
Although ETL is usually used, I was wondering how ELT could be applied to this use case, for example :
Use models to reproduce target db tables, and then simply bulk load to the target db (as operational db are usually not served by dwh dbs, except postgresql).
We have actually been looking at using DBT for this exact scenario. Replicate and/or somehow pull in all the source data into staging schemas, do all of the transformations internally, and then just push the finished data as-is out into the target systems.
We were thinking about using Postgres as the Data Migration database where all of the dbt transformations take place as some of its features, like stored procedures for example, could be useful because in data migration you can run into a lot of funny requirements that may go beyond dbt’s normal model and DAG abilities.
Also thinking about using Amazon DMS as a way to replicate the data from the sources into staging schemas in Postgres.
Thank you @josh, I hope you will find success, and some time to give us some feedback !
The more I use DBT, the more I see use cases…
In the end, could we think of DBT as a functional way of processing data, of model as a function F(M, M’, M’’…) -> M’’’ generating one new model from preceeding ones on the DAG ?
Are models actually materialized only for performance purposes ?
In short, dbt is not an ETL tool, it is not designed to move data between sources and targets, rather it expects your data to already be in your warehouse. You can read more about our thoughts on this here.
You could definitely try to push dbt to do the things you want to do, but it will feel like swimming upstream.
If you’re looking for a tool to move data from an operational database to a data warehouse, we typically use Stitch or Fivetran to do this for us.
HI @claire, thank you !
Of course, DBT works ontop a datawarehouse.
Have you ever used an ETL to migrate an entire operational database to another operational one, with different schemas ? It’s… painful, and not really iterative.
But let’s imagine :
Bulk copy the source db into postgresql (like we would do in a DWH staging area),
Use seeds to load all the needed configuration (codification mappings, lookup tables…),
Write DBT models, not to get a bi schema, but the exact copy of the target operational schema.
Bulk load the models’ schema to the target database.
I totally agree this is not what DBT was built for, but it would be a real gain to use a fonctional language like SQL. And, icing on the cake, we could use all the DBT testing possibilities to validate the data !
I just want to clarify here after re-reading this article. I had something fairly specific in mind when talking about “Data Migration.” The specific meaning of Data Migration that I had in mind is:
A project undertaken when needing to implement a new line-of-business system which then requires taking the data in the (probably relational rowstore) database which is the back-end for the old line-of-business system, and then mapping, cleansing, and transforming it to be inserted into the new line-of-business system.
One example of a project like this would be moving from Oracle E-Business Suite to another ERP system.
Data migrations in this context tend to be one-time affairs where a large volume of data is extracted, transformed, cleansed, and loaded into the database which is the back-end for the new line-of-business system. Now, the migration may need to be run multiple times for testing purposes, but once it is run “for real”, the cutover to the new system happens and the migration is never run again.
This definition of “Data Migration” may not be what was meant by the original poster. They may have meant a more generic definition of “Data Migration” which basically just means copying data from one database to another database via replication or some other means. Clearly DBT is not a data replication tool and is meant to operate only within one database platform.
Having said that, in the more detailed and specific definition of Data Migration which I provided above, we believe that DBT would provide significant value because there is often a lot of complexity in the transformations, significant ongoing development over the course of a new system implementation project, and great value in having automated testing of data.
What I would want is for DBT to be able to push data OUT to another database. As an example, we are now creating a dataset in snowflake that is directly consumed by an API. For that, the Snowflake instance needs to ‘stay alive’ to have good response. It would be great if, after the final step of processing data we could push data out to a DB like memchached / redis. Or even just postgres