Good morning ! FYI, quite new in dbt thanks to the Coalesce conference , so sorry if I’m doing it wrong.
I was playing around with models and I have a question about data refactoring. I work in a company that deals with customers that have different languages, and depending on that language I wan’t to update my query followingly.
As instance I have a column called state that either container the string accepted or not accepted values by default, but I want to change them to accepté and non accepté if the customer is french.
I was wondering what was the best practice to do so. I was thinking of using a variable that specifies the customer’s language, but I wonder how to switch values in the query smoothly.
A first approach would be to have a if/else condition but if I would like to add more languages in the future (which will surely happen), I fear it would became a mess like this {% if var('language') == 'en' %} 'accepted' {% elif var('language') == 'fr' %} 'accepté' {% elif ... %} 'akzeptiert' ... { % endif %}.
What I was thinking of was to read value from an external YAML file, the way we can use the directive doc to load markdown files for documentation.
The final result would look like : {{ yaml('my_yaml_file')['state'][var('language')] }} which is much better.
Any thoughts about how to implement this ?
Thanks !
@Dynnammo Can you say more about your use case? If you did want to do this I’d agree with @obar1 that a seed is a good place to store a lookup table, but I’m not convinced that dbt is the place to do this at all.
dbt is normally used for analytics workloads, where instead of changing the display text you would be normalising it by flattening the status down to a single state_id column or something.
From your description, it’s unclear to me whether you’re changing the display for inclusion in rollup reports to stakeholders from multiple languages, or planning to use this to control the UI that’s shown to end users. Either is possible I guess, but the latter is not a common use case given the latency that normally exists in ETL workloads.
I didn’t know dbt as such feature, which appears to completely fulfill the use case.
I’m in the second use case. Since I’m not concerned (yet) by performances issues (I usually work with tables of less than 10000 lines), it should do the trick.