Replacing content in thousands of files? No problem!
In the recent weeks and months, the FSFE Web Team has been doing some heavy work on the FSFE website. We moved and replaced thousands of files and their respective links to improve the structure of a historically grown website (19+ years, 23243 files, almost 39k commits). But how to do that most efficiently in a version controlled system like Git?
In our scenarios, the steps executed often looked like the following:
For the first step, using the included git mv
is perfectly fine.
For the second, we would usually need a combination of grep
and sed
, e.g.:
grep -lr "/old/page.html" | xargs sed 's;/old/page.html;/new/page.html;g'
This has a few major flaws:
.git
directory where we do not want to edit files directlyAfter some research, I found git-sed
, basically a Bash file in the git-extras project. With some modifications (pull request pending) it’s the perfect tool for mass search and replacement.
It solves all of the above problems:
git grep
that ignores the .git/
directory, and is much faster because it uses git’s index.You can just install the git-extras package which also contains a few other scripts.
I opted for using it standalone, so downloaded the shell file, put it in a directory which is in my $PATH
, and removed one dependency on a script which is only available in git-extras (see my aforementioned PR). So for instance, you could copy git-sed.sh
in /usr/local/bin/
and make it executable. To enable calling it via git sed
, put in your ~/.gitconfig
:
[alias]
sed = !sh git-sed.sh
After installing git-sed, the command above would become:
git sed -f g "/old/page.html" "/new/page.html"
My modifications also allow people to use extended Regex, so things like reference captures, so I hope these will be merged soon. With this, some more advanced replacements are possible:
# Use reference capture (save absolute link as \1)
git sed -f g "http://fsfe.org(/.*?\.html)" "https://fsfe.org\1"
# Optional tokens (.html is optional here)
git sed -f g "/old/page(\.html)?" "/new/page.html"
And if you would like to limit git-sed to a certain directory, e.g. news/
, that’s also no big deal:
git sed -f g "oldstring" "newstring" -- news/
You may have notived the -f
flag with the g
argument. People used to sed know that g
replaces all appearances of the searched pattern in a file, not only the first one. You could also make it gi
if you want a case-insensitive search and replace.
As you can see, using git-sed is really a time and nerve saver when doing mass changes on your repositories. Of course, there is also room for improvement. For instance, it could be useful to use the Perl Regex library (PCRE) for the grep and sed to also allow for look-aheads or look-behinds. I encourage you to try git-sed and make suggestions to upstream directly to improve this handy tool.
Comments