I am trying to convert phone numbers to E.164 format

The problem

I have a table where it consists of two columns with customer phone numbers and Country . I need to convert it to E.164 format .

The context of why I’m trying to do this

I am trying to solve with the dbt utils package and not able to find phone macro in it . So that I can use the macro in a Select statement.

Try to use Python to return only that transformed column and use it in Sql model as a macro.

Is there any way to do like this . Let me know which way it can be solved.

What I’ve already tried

I am trying to solve with the dbt utils package and not able to find phone macro in or UDF.

Some example code or error messages

Put code inside backticks
  to preserve indentation
    which is especially important 
      for Python and YAML! 

Hi @sivareddy3244, you can create your own custom macro!

If you provide some examples of your columns values I can give some help

1 Like

Hello bruno, I would say that for an example I have two columns named with phone_no and Country . I will state the similar values that I am dealing with.

004917642024148 Germany
017642025128. Germany
1762390345 France
+331762390345. France
Goes on

I have seen a similar problem dealt with Python code on net using library phonenumbers.

I just want to use this columns and create a transformed column with E.164 format .

Thank. You in advance.

Do you have an example of a python library to show? We can convert that logic to SQL and use a jinja macro to compile this SQL logic

1 Like

Hello Bruno.

Here is an example in Python . I need to implement in dbt

import phonenumbers

my_number = phonenumbers.parse(“0721234567”, “RO”)
e164_f=phonenumbers.format_number(my_number, phonenumbers.PhoneNumberFormat.E164)
print(e164_f)

Ok, this is what I thought, you can give your opinion about that:

Assumptions:

  • Your columns are called phone_number and country
  • The columns are of type string
  • The steps to transform a string into a e.164 phone number format are:
    1 - Remove all non-numeric characters from phone_number
    2 - Remove leading zeros from phone_number
    3 - Check if the country code is included, if it is not, include it
    4 - Include a plus sign at the end

My solution:

Create a macro in macros/ folder with this SQL code:

{% macro country_code(country) %}

    case
        when trim(lower({{ country }})) = "germany" then "49"
        when trim(lower({{ country }})) = "france" then "33"
    end

{% endmacro %}


{% macro e164(phone_number, country) %}

  case
      when
         left(
              ltrim(
                regexp_replace(
                    {{ phone_number }}, '[^0-9]+', ''
                ), '0'
              )
              , length(
                  {{ country_code(country) }}
              )
          ) = {{ country_code(country) }}
      then
          concat(
              '+'
              , ltrim(
                  regexp_replace(
                      {{ phone_number }}, '[^0-9]+', ''
                  ), '0'
              )
          )
      else
          concat(
              '+'
              , {{ country_code(country) }}
              , ltrim(
                  regexp_replace(
                      {{ phone_number }}, '[^0-9]+', ''
                  )
                  , '0'
              )
          )
      end

{% endmacro %}

Then in your model you can do something like

select
    some_id
    , some_column
    ,  {{ e164('phone_number', 'country') }} as e164_phone_number
from your_model

What you have to know

  • These functions work for BigQuery, if you are using another DB, then maybe you will have to adapt the functions
  • I didn’t care for performance, just wrote some idea, so it is not optimized, but you can work on that
  • In this solution, you must manually input the countries’ codes in a case when statement. Maybe using a dictionary is better suited? Don’t know
  • Sorry if the indentation is ugly, you can change that the way you prefer
1 Like

Thanks a lot bruno. I will work on it and write to you back how it goes .

1 Like

Sure, let me know if you have any problems! And if it works I would appreciate if you could mark it as the solution :smiley:

For sure . If have any problem I will definitely write to you back. If it works it would be my solution.

It is possible to run Python workloads on BigQuery via Dataproc: Python models | dbt Developer Hub

If you use a Dataproc Cluster (as opposed to Dataproc serverless), you can install arbitrary packages from pip or conda, which would presumably include the phonenumbers library.

In a past life I used the .net port of the phonenumbers library and would definitely recommend using that instead of trying to normalise in SQL if you can!

2 Likes

I am using with snowflake

Joel’s suggestion works in Snowflake as well, via Snowpark! You can check it out in the same link he showed

Ahh my mistake! I saw BQ mentioned above and went with it. Bruno’s right, phonenumbers is available on Snowpark via anaconda: Snowflake Snowpark for Python

1 Like

I am so used to SQL models that I went straight to the SQL solution :laughing:

But I agree this is a very good case for python models, and if you are not familiarized with that, it is a good first problem solve while learning

2 Likes

But can we use this Python model as a ref or macro as used above in the Sql after installing the phonenumbers library.

I see that we can’t use the Python models in another models as an import or a macro.

Do you have the better suggestion or correct me . Thanks

You can’t create a Python macro that you call from SQL, but you can create a Python model that returns the transformed data as a normal table (can’t be a view), and then you can ref it in downstream models, whether those models are written in SQL or Python.

The Overview section of the Python models docs does a good job of explaining their capabilities and limitations

2 Likes