Migrating a Segment dataset in BigQuery
We have several Segment datasets in BigQuery which we wanted to rename. Some we named wrong in the past,
and people had trouble finding the right dataset, on others we used special characters incompatible with BigQuery schema,
and Segment picked a weird name like github_org_webhooks_1bwroarvrkz0c26mibaiunxoqh0
.
This begs a question - what's the right way to do it?
I couldn't find it in the docs, but Segment's support promptly responded I needed to take 4 steps: disable the sync, rename the dataset, rename it in Segment & reenable the sync.
It turned out a little more complicated - so here's some extra detail.
1. Disable the warehouse destination for the source you wish to rename
Make sure no sync is running, and just click the switch. Easy enough.
2. Rename the dataset in BigQuery
Mind you, renaming the dataset is not really a thing. But you can definitely create a new dataset with the desired name, and copy the data over.
This is easy enough in the UI. Just click your project, and create a new dataset with the new name. Then go to the old dataset, and copy it over to the new one.
3. Re-create the views
Segment BigQuery exports come with raw data with duplicates in tables, and then deduplicated data in views. The views also have an annoying property of only showing the last 60 days of data. The copy above doesn't copy the views, so we need to recreate them.
This might get a bit tedious to do by hand, so let's automate it.
3.1 Create a BigQuery integration
On the left side in the Integrations, create a new BigQuery integration and copy your service account json with sufficient permissions.
3.2 Create all the views
4. Rename the destination table in Segment
You need to go to your source, then Settings, then SQL Settings, and change the Schema Name.
5. Switch it back on
Now go back to the BigQuery destination settings, and switch it back on for the desired source.
6. The gotcha - IAM
I did all this, and waited for the next sync to happen. It failed on insufficient permissions.
Now perhaps it's because I didn't follow the guide right during setup or it changed since I first did it,
but our service account for segment had a role BigQuery User
, and it managed to create its own tables so far,
but after the migration it failed on writing to existing ones.
Now the guide recommends BigQuery Data Owner
and BigQuery Job User
, and once I updated this, the next sync ran fine. It also backfilled the day it missed because of the permission issue.
So just double check if you have the right IAM settings for the service account.
That's it!