A really common task in web analytics is to extract the marketing fields - source, medium, campaign, and content - from a web session. To do so, you only need the landing page and referrer of the session.
However, HOW to extract the marketing fields from the landing page and referrer is not necessarily straightforward. Many people simple look for utm parameters in the landing page (like utm_source
). That’s not the best way to get the most accurate information possible.
For instance, if there are no utm parameters, but there is a gclid
parameter, we know that the user came from a google ads campaign, so the source should be “google” and the medium should be “cpc” (at least if we’re trying to match Google Analytic’s naming convention).
Similarly, if there’s a fbclid
parameter, we know the user came from a paid facebook campaign, even if the marketing team didn’t include utm parameters in their ad’s url.
To capture as much marketing information as possible, I created the below macro. Unlike Google Analytics, it distinguishes between “direct” and “unkown” sources when a given row does not have a landing page. It also standardize the names of sources (“m.facebook.com” → “facebook”).
It takes a source
table with landing_site
, referring_site
, and id
(of each row) columns and outputs a table with the following fields:
id
marketing_source,
marketing_medium,
marketing_campaign,
marketing_content,
marketing_search_term,
landing_page,
referrer_host,
fbclid,
gclid
You can then join the results (using the id
field) with your original table to enrich your data with these marketing fields.
Hopefully this is helpful in sessionization and attribution packages like segment. We use it in our Shopify Analytics tool for attributing Shopify orders to marketing efforts.
If anyone has any suggestions for how to make this better, either more user/dbt-friendly, or more accurate, please let me know!
{%- macro get_marketing_fields(source, landing_site, referring_site, id, pass_through_fields) -%}
-- TODO probably should do this per column instead of per table
with init_data as (
select
{{ id }} as id,
{% for field in pass_through_fields %}
{{field}},
{% endfor %}
{{ referring_site }} as referring_site,
{{ landing_site }} as landing_site
from {{ source }}
),
url_parsing as (
select
{{ dbt_utils.get_url_host(field=referring_site) }} as referrer_host,
lower({{ dbt_utils.get_url_path(field='landing_site') }}) as landing_page,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_source') }}) as utm_source,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_medium') }}) as utm_medium,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_campaign') }}) as utm_campaign,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_content') }}) as utm_content,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='utm_term') }}) as utm_term,
lower({{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='q') }}) as search_param_q,
{{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='gclid') }} as gclid,
{{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='dclid') }} as dclid,
{{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='fbclid') }} as fbclid,
{{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='mc_cid') }} as mc_cid,
{{ dbt_utils.get_url_parameter(field='landing_site', url_parameter='mc_eid') }} as mc_eid,
*
from init_data
),
url_parsing_2 as (
select
{{ get_url_root_domain(referrer_host) }} as referrer_root_domain,
*
from url_parsing
),
marketing_sources as (
select
case
when utm_source is not null then utm_source
-- search engines
when (gclid is not null) or (dclid is not null) then 'google'
when referrer_root_domain like '%google%' then 'google'
when referrer_root_domain like '%bing%' then 'bing'
when referrer_root_domain like '%yahoo%' then 'yahoo'
when referrer_root_domain like '%duckduckgo%' then 'duckduckgo'
-- social networks
when referrer_root_domain like '%facebook%' then 'facebook'
when referrer_root_domain like '%instagram%' then 'instagram'
when referrer_root_domain like '%igshopping%' then 'instagram'
when referrer_root_domain like '%twitter%' then 'twitter'
when referrer_root_domain like '%linkedin%' then 'linkedin'
when fbclid is not null then 'facebook'
-- emails
when (mc_cid is not null) or (mc_eid is not null) then 'mailchimp'
when referring_site like '%com.google.android.gm%' then 'gmail'
-- unkown
when landing_site is null then 'unkown'
-- direct
when referrer_root_domain is null then 'direct'
-- referral
else regexp_replace(referrer_host, '^www\.', '')
end as marketing_source,
*
from url_parsing_2
),
marketing_mediums as (
select
case
when utm_medium is not null then utm_medium
-- search engines
when (gclid is not null) or (dclid is not null) then 'cpc'
when marketing_source = 'google' then 'organic'
when marketing_source = 'bing' then 'organic'
when marketing_source = 'yahoo' then 'organic'
when marketing_source = 'duckduckgo' then 'organic'
-- social networks
when fbclid is not null then 'cpc'
when referrer_root_domain like '%igshopping%' then 'paid'
when marketing_source = 'facebook' then 'organic'
when marketing_source = 'instagram' then 'organic'
when marketing_source = 'twitter' then 'organic'
when marketing_source = 'linkedin' then 'organic'
-- emails
when marketing_source = 'mailchimp' then 'email'
when marketing_source = 'gmail' then 'email'
-- unkown
when marketing_source = 'unkown' then 'unkown'
-- direct
when marketing_source = 'direct' then 'none'
-- referral
else 'referral'
end as marketing_medium,
*
from marketing_sources
),
marketing_search_terms as (
select
case
when utm_term is not null then utm_term
-- search engines
when marketing_source = 'bing' then search_param_q
when marketing_source = 'yahoo' then search_param_q
when marketing_source = 'duckduckgo' then search_param_q
end as marketing_search_term,
*
from marketing_mediums
)
select
id,
{% for field in pass_through_fields %}
{{field}},
{% endfor %}
marketing_source,
marketing_medium,
utm_campaign as marketing_campaign,
utm_content as marketing_content,
marketing_search_term,
landing_page,
referrer_host,
fbclid,
gclid
from marketing_search_terms
{%- endmacro -%}