As GNU FreeDink upstream, I'd very much like to offer pre-built binaries: one (1) official, tested, current, distro-agnostic version of the game with its dependencies.
I'm actually already doing that for the Windows version.
One issue though: people have to trust me -- and my computer's integrity.
Reproducible builds could address that.
My release process is tightly controlled, but is my project reproducible? If not, what do I need? Let's check!
I quickly see that documentation is getting better, namely https://reproducible-builds.org/ :)
(The first docs I read on reproducibility looked more like a crazed date-o-phobic rant than actual solution - plus now we have SOURCE_DATE_EPOCH implemented in gcc ;))
However I was left unsatisfied by the very high-level viewpoint and the lack of concrete examples.
The document points to various issues but is very vague about what tools are impacted.
So let's do some tests!
Let's start with a trivial program:
$ cat > hello.c
#include <stdio.h>
int main(void) {
printf("Hello, world!\n");
}
OK, first does GCC compile this reproducibly?
I'm not sure because I heard of randomness in identifiers and such in the compilation process...
$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321 hello-5
$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321 hello-5
Cool, ELF compiler output is stable through time!
Now do 2 versions of GCC compile a hello world identically?
$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd hello-6
$ hexcompare hello-5 hello-6
[lots of red]
...
Well let's not get our hopes too high ;)
Trivial build options change?
$ gcc-6 hello.c -lc -o hello-6
$ gcc-6 -lc hello.c -o hello-6b
$ md5sum hello-6 hello-6b
f7f52c2f5f82fe2a95061a771a6c5acd hello-6
f73ee6d8c3789fd8f899f5762025420e hello-6b
$ hexcompare hello-6 hello-6b
[lots of red]
...
OK, let's be very careful with build options then. What about 2 different build paths?
$ cd ..
$ cp -a repro/ repro2/
$ cd repro2/
$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd hello-6
Basic compilation is stable across directories.
Now I tried recompiling identically FreeDink on 2 different git clones.
Disappointment:
$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
839ccd9180c72343e23e5d9e2e65e237 freedink/native/src/freedink
6d5dc6aab321fab01b424ac44c568dcf freedink2/native/src/freedink
$ hexcompare freedink2/native/src/freedink freedink/native/src/freedink
[lots of red]
Hmm, what about stripped versions?
$ strip freedink/native/src/freedink freedink2/native/src/freedink
$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
415e96bb54456f3f2a759f404f18c711 freedink/native/src/freedink
e0702d798807c83d21f728106c9261ad freedink2/native/src/freedink
$ hexcompare freedink/native/src/freedink freedink2/native/src/freedink
[1 single red spot]
OK, what's happening? diffoscope to the rescue:
$ diffoscope freedink/native/src/freedink freedink2/native/src/freedink
--- freedink/native/src/freedink
+++ freedink2/native/src/freedink
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│ Owner Data size Description
│ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
│ OS: Linux, ABI: 2.6.32
│
│ Displaying notes found in: .note.gnu.build-id
│ Owner Data size Description
│ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
│ - Build ID: a689574d69072bb64b28ffb82547e126284713fa
│ + Build ID: d7be191a61e84648a58c18e9c108b3f3ce500302
What on earth is Build ID and how it is computed?
After much digging, I find it's a 2008 plan with application in selecting matching detached debugging symbols.
https://fedoraproject.org/wiki/RolandMcGrath/BuildID is the most detailed overview/rationale I found.
It is supposed to be computed from parts of the binary. It's actually pretty resistant to changes, e.g. I could add the missing "return 0;" in my hello source and get the exact same Build ID!
On the other hand my FreeDink binaries do match except for the Build ID so there must be a catch.
Let's try our basic example with default ./configure CFLAGS:
$ (cd repro/ && gcc -g -O2 hello.c -o hello)
$ (cd repro/ && gcc -g -O2 hello.c -o hello-b)
$ md5sum repro/hello repro/hello-b
6b2cd79947d7c5ed2e505ddfce167116 repro/hello
6b2cd79947d7c5ed2e505ddfce167116 repro/hello-b
# => OK for now
$ (cd repro2/ && gcc -g -O2 hello.c -o hello)
$ md5sum repro2/hello
20b4d09d94de5840400be05bc76e4172 repro2/hello
$ strip repro/hello repro2/hello
$ diffoscope repro/hello repro2/hello
--- repro/hello
+++ repro2/hello2
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│ Owner Data size Description
│ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag)
│ OS: Linux, ABI: 2.6.32
│
│ Displaying notes found in: .note.gnu.build-id
│ Owner Data size Description
│ GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
│ - Build ID: 462a3c613537bb57f20bd3ccbe6b7f6d2bdc72ba
│ + Build ID: b4b448cf93e7b541ad995075d2b688ef296bd88b
# => issue reproduced with -g -O2 and different build directories
$ (cd repro/ && gcc -O2 hello.c -o hello)
$ (cd repro2/ && gcc -O2 hello.c -o hello)
$ md5sum repro/hello repro2/hello
1571d45eb5807f7a074210be17caa87b repro/hello
1571d45eb5807f7a074210be17caa87b repro2/hello
# => culprit is not -O2, so culprit is -g
Bummer. So the build ID must be computed also from the debug symbols, even if I strip them afterwards :(
OK, so when https://reproducible-builds.org/docs/build-path/ says "Some tools will record the path of the source files in their output", that means the compiler, and more importantly the stripped executable.
Conclusion: apparently to achieve reproducible builds I need identical full build paths and to keep track of them.
What about Windows/MinGW btw?
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe
e0fa685f6866029b8e03f9f2837dc263 hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe
df7566c0ac93ea4a0b53f4af83d7fbc9 hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe
bbf4ab22cbe2df1ddc21d6203e506eb5 hello.exe
PE compiler output is not stable through time.
(any clue?)
OK, there's still a long road ahead of us...
There are lots of other questions.
Is autoconf output reproducible?
Does it actually matter if autoconf is reproducible if upstream is providing a pre-generated ./configure?
If not what about all the documentation on making tarballs reproducible, along with the strip-nondeterminism tool?
Where do we draw the line between build and build environment?
What are the legal issues of distributing a docker-based build environment without every single matching distro source packages?
That was my modest contribution to practical reproducible builds documentation for developers, I'd very much like to hear about more of it.
Who knows, maybe in the near future we'll get reproducible official builds for Eclipse, ZAP, JetBrains, Krita, Android SDK/NDK...
Windows executables include a "link time" field which you need to fix. If you build a PDB they will also contain the absolute path to that by default.
(There are probably some other issues; I haven't worked on Windows for a long time.)
Thanks!
Stephen Kitt also pointed that Debian Stretch's MinGW has improved reproducibility provided you trigger it with SOURCE_DATE_EPOCH.
'-Wl,--no-insert-timestamp' helps too. I'm currently running additional tests, I'll probably post a follow-up
The particular issue of varying BuildID-s due to the debugging symbols can actually be fixed through
-fdebug-prefix-map
already, e.g.:I wish I was told earlier.