Hi,
I’m struggling with getting the second data model query to read the output from the first data model file.
The first data model sql ran without issue; the only parameter I used was {{ config(materialized='table') }} no alias assigned.
But when I tried
select *
from {{ ref('my_first_dbt_model') }}
where id = 1
I got
Server error: Database Error in rpc request (from remote system)
Could not find parent table for alias "segment.dbt_pfan.my_first_dbt_model".
I tried assigning the first data model sql an alias and reference that in the second file, but still doesn’t work.
Hey @pfan, can you post the full content of your two models?
I don’t think this is the error message I’d expect, but just checking that you saved the first model before running dbt run? (this caught me out when I first started using dbt Cloud)
By the way - the error you’re seeing is from your data warehouse (I think it’s Redshift?) not dbt itself. Could you also check whether the table/view segment.dbt_pfan.my_first_dbt_model exists in the warehouse?
If it does exist, then it could be a permissions error for your dbt user
Do you have access to another database query tool, e.g. DataGrip or even the AWS Redshift Console? If so, you can check there. The reason I suggest these is that if you have a more powerful account you can use to access those then you can confirm whether the model exists even if your dbt user doesn’t have access.
My other suggestion is that I noticed in your screenshot you have a lot of custom code. It might be worth going back to the default code (you can find it here if you need it) and making sure that works before continuing on. That way you can check whether the problem is your database connection etc or a mistake in your model itself.
Does this mean you managed to make it work? I’m unclear on whether you still need help. If your connection is still not working, I recommend you reach out to the dbt Cloud support team using the chat bubble in-app, as they will have the ability to review logs etc to see what’s happening.
dbt does not currently have native support for materialized views. The different warehouses/data platforms don’t have consistent implementations for materialized views yet so it’s not practical for us to build an abstraction over them. There is a sample package here with demo code, but it’s not something I’d look into until I was already more familiar with dbt.
So I was able to pull data from my redshift database with the first my_first_dbt_model.sql. It returns a list of user_ids.
However, when I tried to refer to the output from that sql file, that’s why I got the error I showed in my original message.
This sounds like you tried to use the user_ids directly in a next query, something like this
--This won't work
select *
from {{ source('schema', 'table') }}
where user_id in ({{ ref('my_first_dbt_model') }}
Am I understanding you correctly? If so, that won’t work - ref returns the full table name, not the list of data it contains.
If not, can you please post the full code of my_first_dbt_model.sql and whatever second model you were writing when you got the error message? You can post code by typing three backticks (```) at the start and end of a code block.
{{ config(materialized='table') }}
select distinct user_id
from public.pages
my_second_dbt_model.sql:
select *
from {{ ref('my_first_dbt_model') }}
where user_id = 1
So I think my understanding of what {{ ref('my_first_dbt_model') }} is wrong. Because I did think it would return the output of the query from the first model. If it is simply returning the name of the first dbt model, which isn’t a materialized view in redshift, that explains why the second query failed because it doesn’t exist in my database.
So I’m a little confused as to how to use dbt models and build materialized views.
My objective is using dbt to create intermediate tables that are easier to query than using a long and complex query to pull data directly.
Those two queries are correctly written (the first one should really use a source to reference public.pages, but we can set that aside for a moment).
When you invoke dbt run, it should compile the refs into actual table names and run code along the lines of
create table something.something.my_first_dbt_model as (
select distinct user_id
from public.pages
)
and
create or replace view something.something.my_second_dbt_model as (
select *
from something.something.my_first_dbt_model
where user_id = 1
)
You’ll see that the second one doesn’t return the output of the first query per se (i.e. it’s not returning an array of [1, 2, 3, 4, ... 99999]), just the name of the first model. This does let you build intermediate tables that are easier to query, so you’re on the right track conceptually.
Have you gone through the getting started tutorial? It looks like you might have dived straight into trying to build your own project given that you’re referencing a custom table (public.pages). The tutorial covers all the best practices and has Redshift-specific guidance.
Ah I didn’t run dbt run on my_first_dbt_model.sql so that’s why it didn’t work.
After I did that, I was able to run my_second_dbt_model.sql:
select *
from public.pages
where user_id IN
(select user_id from {{ ref('my_first_dbt_model') }})
limit 10
I was able to run this query using compiled SQL and it returns result as expected.
But when I ran dbt run on this, it shows an error:
2022-10-10T23:11:00.314388Z: 23:11:00 On model.my_new_project.my_second_dbt_model: /* {"app": "dbt", "dbt_version": "1.2.2", "profile_name": "user", "target_name": "default", "node_id": "model.my_new_project.my_second_dbt_model"} */
create view "segment"."dbt_pfan"."my_second_dbt_model__dbt_tmp" as (
-- Use the `ref` function to select from other models
select *
from "segment"."dbt_pfan"."my_first_dbt_model"
where id = 1
) ;
2022-10-10T23:11:00.320278Z: 23:11:00 Postgres adapter: Postgres error: column "id" does not exist in
my_first_dbt_model
It looks like you might not have saved my_second_dbt_model before running - the compile step in dbt Cloud doesn’t need you to save first, but doing a dbt run does only look at work you’ve saved. (It’s a common pain point, and we’re looking into ways to make this clearer on the revamped IDE we’re building at the moment).
Thank you! So I need to save the model.sql file before executing dbt run.
One more question - for the tables created from the my_first_dbt_model.sql and my_second_dbt_model.sqlfiles, are they temporary tables or materialized views in redshift? If other people want to query those tables from another BI tool, can they? and I supposed they can be refreshed via running the models at a scheduled cadence?
By default, they’re built as standard views (note: not materialized views). You can also change them to build as permanent tables. Check out Materializations | dbt Developer Hub for more info.
Absolutely! dbt Cloud comes with a schedule runner built in, since you’re already using the IDE you have access to it: Schedule a job and Running dbt in production
dbt Core doesn’t come with its own scheduler, so you’ll need a separate workflow orchestration tool. We list some options here About deployments | dbt Developer Hub
You’re using the dbt Cloud IDE though, so you have access to dbt Cloud. Why would you not use its scheduler?