Macro for Extracting Marketing Fields

A really common task in web analytics is to extract the marketing fields - source, medium, campaign, and content - from a web session. To do so, you only need the landing page and referrer of the session.

However, HOW to extract the marketing fields from the landing page and referrer is not necessarily straightforward. Many people simple look for utm parameters in the landing page (like utm_source). That’s not the best way to get the most accurate information possible.

For instance, if there are no utm parameters, but there is a gclid parameter, we know that the user came from a google ads campaign, so the source should be “google” and the medium should be “cpc” (at least if we’re trying to match Google Analytic’s naming convention).

Similarly, if there’s a fbclid parameter, we know the user came from a paid facebook campaign, even if the marketing team didn’t include utm parameters in their ad’s url.

To capture as much marketing information as possible, I created the below macro. Unlike Google Analytics, it distinguishes between “direct” and “unkown” sources when a given row does not have a landing page. It also standardize the names of sources (“m.facebook.com” → “facebook”).

It takes a source table with landing_site, referring_site , and id (of each row) columns and outputs a table with the following fields:

        id
        marketing_source,
        marketing_medium,
        marketing_campaign,
        marketing_content,
        marketing_search_term,
        landing_page,
        referrer_host,
        fbclid,
        gclid

You can then join the results (using the id field) with your original table to enrich your data with these marketing fields.

Hopefully this is helpful in sessionization and attribution packages like segment. We use it in our Shopify Analytics tool for attributing Shopify orders to marketing efforts.

If anyone has any suggestions for how to make this better, either more user/dbt-friendly, or more accurate, please let me know!

{%- macro get_marketing_fields(source, landing_site, referring_site, id, pass_through_fields) -%}
    -- TODO probably should do this per column instead of per table

    with init_data as (
        select 
            {{ id }} as id,

            {% for field in pass_through_fields %}
            {{field}},
            {% endfor %}

            {{ referring_site }} as referring_site,
            {{ landing_site }} as landing_site
        from {{ source }}
    ),
    
    url_parsing as (
        select
            {{ dbt_utils.get_url_host(field=referring_site) }} as referrer_host,
            lower({{ dbt_utils.get_url_path(field='landing_site') }}) as landing_page,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_source') }}) as utm_source,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_medium') }}) as utm_medium,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_campaign') }}) as utm_campaign,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_content') }}) as utm_content,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_term') }}) as utm_term,
            lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='q') }}) as search_param_q,
            {{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='gclid') }} as gclid,
            {{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='dclid') }} as dclid,
            {{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='fbclid') }} as fbclid,
            {{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='mc_cid') }} as mc_cid,
            {{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='mc_eid') }} as mc_eid,
            *
        from init_data
    ),

    url_parsing_2 as (
        select
            {{ get_url_root_domain(referrer_host) }} as referrer_root_domain,
            *
        from url_parsing
    ),

    marketing_sources as (
        select
            case
            when utm_source is not null then utm_source
            -- search engines
            when (gclid is not null) or (dclid is not null) then 'google'
            when referrer_root_domain like '%google%' then 'google'
            when referrer_root_domain like '%bing%' then 'bing'
            when referrer_root_domain like '%yahoo%' then 'yahoo'
            when referrer_root_domain like '%duckduckgo%' then 'duckduckgo'
            -- social networks
            when referrer_root_domain like '%facebook%' then 'facebook'
            when referrer_root_domain like '%instagram%' then 'instagram'
            when referrer_root_domain like '%igshopping%' then 'instagram'
            when referrer_root_domain like '%twitter%' then 'twitter'
            when referrer_root_domain like '%linkedin%' then 'linkedin'
            when fbclid is not null then 'facebook'
            -- emails
            when (mc_cid is not null) or (mc_eid is not null) then 'mailchimp'
            when referring_site like '%com.google.android.gm%' then 'gmail'
            -- unkown
            when landing_site is null then 'unkown'
            -- direct
            when referrer_root_domain is null then 'direct'
            -- referral
            else regexp_replace(referrer_host, '^www\.', '')
            end as marketing_source,
            *
        from url_parsing_2
    ),

    marketing_mediums as (
        select
            case
            when utm_medium is not null then utm_medium
            -- search engines
            when (gclid is not null) or (dclid is not null) then 'cpc'
            when marketing_source = 'google' then 'organic'
            when marketing_source = 'bing' then 'organic'
            when marketing_source = 'yahoo' then 'organic'
            when marketing_source = 'duckduckgo' then 'organic'
            -- social networks
            when fbclid is not null then 'cpc'
            when referrer_root_domain like '%igshopping%' then 'paid'
            when marketing_source = 'facebook' then 'organic'
            when marketing_source = 'instagram' then 'organic'
            when marketing_source = 'twitter' then 'organic'
            when marketing_source = 'linkedin' then 'organic'
            -- emails
            when marketing_source = 'mailchimp' then 'email'
            when marketing_source = 'gmail'  then 'email'
            -- unkown
            when marketing_source = 'unkown' then 'unkown'
            -- direct
            when marketing_source = 'direct' then 'none'
            -- referral
            else 'referral'
            end as marketing_medium,
            *
        from marketing_sources
    ),

    marketing_search_terms as (
        select
            case
            when utm_term is not null then utm_term
            -- search engines
            when marketing_source = 'bing' then search_param_q
            when marketing_source = 'yahoo' then search_param_q
            when marketing_source = 'duckduckgo' then search_param_q
            end as marketing_search_term,
            *
        from marketing_mediums
    )

    select
        id,

        {% for field in pass_through_fields %}
        {{field}},
        {% endfor %}

        marketing_source,
        marketing_medium,
        utm_campaign as marketing_campaign,
        utm_content as marketing_content,
        marketing_search_term,
        landing_page,
        referrer_host,
        fbclid,
        gclid
        
    from marketing_search_terms

{%- endmacro -%}

Here’s the get_url_host macro used above:

{%- macro get_url_root_domain(hostname ) -%}
    substring(referrer_host from  '([^\.]+)\.[a-z]+$')
{%- endmacro -%}

Hi Ted,

Thanks for the feedback. But the macros in dbt-utils just extract parameters from urls. They don’t do the logic of turning url params + referrers + other information into actual sources.

I am using the dbt_utils.get_url_parameter macro though.

Thanks!
Kevin

I see that now! I’m sorry I jumped to conclusions. Thanks for sharing this!

1 Like