Binarization with SQL/macros

I’m not sure if this is the appropriate place to ask; please feel free to direct me somewhere else!

The problem I’m having

I have been struggling to convert a categorical column (containing a varying number of comma separated strings) into binary columns for use in classification algorhythms.

The context of why I’m trying to do this

This column has roughly 5000 unique values and I may be doing this to other columns as well, so I am seeking a dynamic approach

What I’ve already tried

Chat GPT and I have tried a number of various macros and this is the iteration I’m currently on:

{% macro binarize_column(column, prefix) %}

WITH split_data AS (
    LOWER({{ column }}) AS {{ column }},
    REGEXP_SPLIT_TO_TABLE(LOWER({{ column }}), ',') AS split_column
  FROM {{ ref('my_table') }}

unique_data AS (
  SELECT DISTINCT split_column
  FROM split_data

binned_data AS (
    {{ column }},
    {% for row in unique_data %}
        WHEN {{ row.split_column }} IS NOT NULL THEN 1
        ELSE 0
      END AS {{ prefix }} || '_' || {{ row.split_column }}
      {% if not loop.last %},{% endif %}
    {% endfor %}
  FROM split_data

FROM binned_data;

{% endmacro %}

and calling it via:

{{ config(materialized='table') }}

{{ binarize_column('my_column', 'my_prefix') }}

This is the resulting error message when trying to dbt run

22:34:35.609155 [error] [MainThread]: Database Error in model test (models/intermediate/my_model.sql)
22:34:35.609412 [error] [MainThread]:   syntax error at or near "FROM"
22:34:35.609717 [error] [MainThread]:   LINE 29:   FROM split_data