I prototyped block environment destruction a while back (broforce like).
What I came up with was using a single scroller to hold both states of the stage. Copy the "filled" state section to a ram work copy scroller data, then replacing with "empty" data as blocks gets destroyed.
You can use a similar technique, as your background is static you likely won't need a ram work copy, just edit vram on the go.
Given block size/alignment don't fit exact tiles, you will need to produce a set of tiles to match every combination (96 tiles from what I can see on video about alignment, maybe more to account for border shadow).
Test raw map:

Play result:
