Hello bruno, I would say that for an example I have two columns named with phone_no and Country . I will state the similar values that I am dealing with.
004917642024148 Germany
017642025128. Germany
1762390345 France
+331762390345. France
Goes on
I have seen a similar problem dealt with Python code on net using library phonenumbers.
I just want to use this columns and create a transformed column with E.164 format .
Ok, this is what I thought, you can give your opinion about that:
Assumptions:
Your columns are called phone_number and country
The columns are of type string
The steps to transform a string into a e.164 phone number format are:
1 - Remove all non-numeric characters from phone_number
2 - Remove leading zeros from phone_number
3 - Check if the country code is included, if it is not, include it
4 - Include a plus sign at the end
My solution:
Create a macro in macros/ folder with this SQL code:
{% macro country_code(country) %}
case
when trim(lower({{ country }})) = "germany" then "49"
when trim(lower({{ country }})) = "france" then "33"
end
{% endmacro %}
{% macro e164(phone_number, country) %}
case
when
left(
ltrim(
regexp_replace(
{{ phone_number }}, '[^0-9]+', ''
), '0'
)
, length(
{{ country_code(country) }}
)
) = {{ country_code(country) }}
then
concat(
'+'
, ltrim(
regexp_replace(
{{ phone_number }}, '[^0-9]+', ''
), '0'
)
)
else
concat(
'+'
, {{ country_code(country) }}
, ltrim(
regexp_replace(
{{ phone_number }}, '[^0-9]+', ''
)
, '0'
)
)
end
{% endmacro %}
Then in your model you can do something like
select
some_id
, some_column
, {{ e164('phone_number', 'country') }} as e164_phone_number
from your_model
What you have to know
These functions work for BigQuery, if you are using another DB, then maybe you will have to adapt the functions
I didn’t care for performance, just wrote some idea, so it is not optimized, but you can work on that
In this solution, you must manually input the countries’ codes in a case when statement. Maybe using a dictionary is better suited? Don’t know
Sorry if the indentation is ugly, you can change that the way you prefer
If you use a Dataproc Cluster (as opposed to Dataproc serverless), you can install arbitrary packages from pip or conda, which would presumably include the phonenumbers library.
In a past life I used the .net port of the phonenumbers library and would definitely recommend using that instead of trying to normalise in SQL if you can!
Ahh my mistake! I saw BQ mentioned above and went with it. Bruno’s right, phonenumbers is available on Snowpark via anaconda: Snowflake Snowpark for Python
You can’t create a Python macro that you call from SQL, but you can create a Python model that returns the transformed data as a normal table (can’t be a view), and then you can ref it in downstream models, whether those models are written in SQL or Python.
The Overview section of the Python models docs does a good job of explaining their capabilities and limitations
The Python model approach is the most robust if your warehouse (Snowflake, Databricks, BigQuery) supports it, if not I think Bruno’s SQL solution is best suited.
Once you’ve formatted your numbers, you can validate them using tools like https://phonenumberformatter.app/ to ensure the E164 format is correct before pushing to production.
Thank you guys for your support i was able to solve it using a function in Snowflake with Python language CREATE OR REPLACE FUNCTION DATA_MART_PROD.CORE.FORMAT_PHONE_NUMBER(“NUMBER” VARCHAR(16777216), “COUNTRY” VARCHAR(16777216)) RETURNS VARCHAR(16777216) LANGUAGE PYTHON RUNTIME_VERSION = ‘3.8’ PACKAGES = (‘phonenumbers’) HANDLER = ‘udf’ AS ’ import phonenumbers def udf(number, country): try: parsed_number = phonenumbers.parse(number, country) return phonenumbers.format_number(parsed_number, phonenumbers.PhoneNumberFormat.E164) except phonenumbers.NumberParseException: return None ';