Rewriting Files in GCP
Note: even though this code is in Python, this should be the same idea in JavaScript, Go, etc.
I wrote the following to copy a file from one Google Cloud Storage bucket to another:
src_blob = src_bucket.blob(file_name) dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name)
But for the bigger files (around 120MB or so) I got the following:
Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.
I noted that copy_blob
has a timeout
parameter, so why not try that?
src_blob = src_bucket.blob(file_name) dest_blob = src_bucket.copy_blob(src_blob, dest_bucket, new_name=new_name, timeout=180)
And… same error:
Copy spanning locations and/or storage classes could not complete within 30 seconds. Please use the Rewrite method (https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite) instead.
Note that it still says 30 seconds, so it totally ignored my timeout
parameter. Looking at the rewrite docs on the link I note that it is just for the ran JSON API, not for Python like I was using. Some digging and StackOverflow reading, I came up with this snippet:
src_blob = src_bucket.blob(file_name) dest_blob = dest_bucket.blob(file_name) rewrite_token = False while True: rewrite_token, bytes_rewritten, bytes_to_rewrite = dest_blob.rewrite( src_blob, token=rewrite_token ) print( f"\t{new_name}: Progress so far: {bytes_rewritten}/{bytes_to_rewrite} bytes." ) if not rewrite_token: break
That will print out each write to the files… and with my 120MB files, there was only one write. Overall I found this faster than copy_blob
even for the small files.