Schema file invalid as 'models' is not a list error performing dbt parse

Am getting the following error when running dbt parse against set of models as shown below. There dbt_project.yml file is same as what is generated during the dbt set-up. I am not able to understand how to debug this error

Prior to getting that error, I encountered the following one which is indicating the version number to be integer error. I have put the value as 2 and then the above error cropped up.

The yml property file at models.…\dbt_project.yml is invalid because its ‘version:’ tag must be an integer (e.g. version: 2). 1.0.0 is not an integer. Please consult the documentation for more information on yml property file syntax:

It’s failing for this particular set of models. The same project file is working for others. Any pointers on where to look for error?

HI @vkarthik21, a couple of comments:

  • it’s very uncommon to call a yaml file in the models directory dbt_project.yml. There should only be a single dbt_project.yml file at the root of your project. It’s not illegal to call your model config files that (they can be whatever you want), but it’s very confusing. Here’s how we normally name our files
  • With that said, it sounds like you have a yaml file that looks something like this:
version: 2 
sources: 
  - name: something
    ... #rest of a source definition here
models: 
#nothing at all here

That is, you have a models key which doesn’t contain a list - it’s probably empty but I guess it could also contain a dictionary or some other structure. Can you post the offending file?

Thank you so much Joe for your reply. Actually we have dbt_project.yml across each sub-directory of datasets.

Am attaching the offending dbt_project.yml file for reference. In this example below, first error I get is - version number should be integer. If I change that to 2, that’s when I get error saying models should be a list. Please let me know if you need additional details. Not sure why it’s happening.

name: 'dev_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'

# These configurations specify where dbt should look for different types of files.
# The `source-paths` config, for example, states that models in this project can be
# found in the "models/" directory. You probably won't need to change these!
source-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
data-paths: ["data"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  # directory which will store compiled SQL files
clean-targets:         # directories to be removed by `dbt clean`
    - "target"
    - "dbt_modules"


# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models

# In this example config, we tell dbt to build all models in the example/ directory
# as tables. These settings can be overridden in the individual model files
# using the `{{ config(...) }}` macro.
models:
  +persist_docs:
    relation: true
    columns: true

This isn’t how dbt works - it expects to find a single dbt_project.yml file in the root of a project, and it expects that all yaml files it finds in the models/ directory to be config/property files which look like this. This is why it’s asking you to set version: 2 - that is the correct value for a yaml file inside of the models directory.

What are you trying to do that means you want to put multiple dbt_project.yml files throughout your directory? Did you know that you can configure behaviour at the directory level from the root of the project?

For example:

#dbt_project.yml

name: 'dev_project'
version: '1.0.0'
config-version: 2

# This setting configures which "profile" dbt uses for this project.
profile: 'default'

#...

models:
  subdirectory_1:
    +persist_docs:
      relation: true
      columns: true
  subdirectory_2:
    +persist_docs:
      relation: false
      columns: false

Thank you. What you are saying makes perfect sense. I hate to say what I am about to say but ‘it was working before’ with the same set-up. We have a shell script task in Azure DevOps pipeline that does the parse against each dataset with dbt parse command. A month prior everything was working fine and then stopped working. I did go back to that change set but don’t see any differences in the .yml files. It’s just that few more models (datasets) have got added to the project but essentially set-up is the same.

The actual error that we are seeing in the when that dbt Parsing step is run is as shown below. To debug the issue when I am doing it from local system, I ran into those two errors as per my original post. So not really sure what is happening.

Am unable to reproduce the error at DevOps pipeline server and am stuck at this place.

2022-12-22T05:44:26.6568883Z Traceback (most recent call last):
2022-12-22T05:44:26.6571316Z File “/home/vsts/.local/lib/python3.10/site-packages/logbook/handlers.py”, line 216, in handle
2022-12-22T05:44:26.6572019Z self.emit(record)
2022-12-22T05:44:26.6572722Z File “/home/vsts/.local/lib/python3.10/site-packages/dbt/logger.py”, line 472, in emit
2022-12-22T05:44:26.6575523Z assert len(self._msg_buffer) < self._bufmax,
2022-12-22T05:44:26.6576312Z AssertionError: too many messages received before initilization!
2022-12-22T05:44:26.6577214Z Logged from file /home/vsts/.local/lib/python3.10/site-packages/logbook/concurrency.py, line 141

Thing is, the whole architecture is set-up in a such a manner that, it loops through each individual directory , parses it as first step of ensuring there are inconsistencies in the model.

In terms of how it’s getting used, all the models for given dataset are first copied over to the cloud. We then have a Airflow composer job (given dataset) running a kubernetes pod, which in turn runs the dbt commands for a given job (against the given dataset).

So it sounds to me like you actually have dozens of tiny little dbt projects - a dbt project is defined as any directory that contains a dbt_project.yml file at its root. If you’re iterating through each directory one at a time, then I can see why you would be able to do this.

Does this mean you have also not added any extra .yml files?

Because you have blurred out the original screenshot, I can’t see what paths your files are located at. I suspect it’s something like this though:

(dbt-prod) joel@Joel-Labes joel-sandbox % dbt parse
...
Parsing Error
    The schema file at joel-sandbox/subdirectory/dbt_project.yml is invalid because the value of 'models' is not a list

This would happen when you’re invoking dbt from the root instead of one of your subdirectories, which means it’s hitting a second dbt_project.yml file which has a different shape