MultiversX Tracker is Live!

Reproducible Gitian Builds .. but not the same hash as bitcoincore.org

Bitcoin Stack Exchange

Bitcoin News / Bitcoin Stack Exchange 117 Views

I've done a rebuild of 0.20.1 and I get the same results that you do. This would indicate that a build dependency has updated to produce slightly different results than the version that was in use at the time of the release. The build dependency versions are pinned unlike the actual software dependencies. IIRC this is common for gitian builds and attempting to rebuild even older releases will result in similar mismatches.

There is ongoing work to move to a different reproducible builds system called guix. This build system will pin the exact dependency versions, along with the option to build the entire toolchain from scratch. This should enable fully reproducible builds that can be replicated at any point in time.


Here is a more in depth explanation of this particular mismatch, copied from my response in this GitHub issue.

If you look at the gitian build results, you will see a file named bitcoin-0.20.1-x86_64-linux-gnu-debug.tar.gz. If you untar this file, there will be *.dbg files, e.g. bitcoind.dbg and bitcoin-qt.dbg. These *.dbg files contain the debug data for their respective binaries, i.e. bitcoind.dbg contains debug data for bitcoind.

To ensure that you use the correct dbg file with the correct binary, gcc embeds a checksum of the dbg file within the binary itself. This means that bitcoind contains a checksum of bitcoind.dbg. This is to prevent attempting to debug bitcoind with another dbg file. For example, if you attempted to tell gdb that the bitcoin-qt.dbg file contained the debugging symbols for bitcoind, it would detect it does not and not attempt to load debugging symbols from bitcoin-qt.dbg.

The debugging symbols are initially compiled into the bitcoind binary, but later during the gitian build these are removed and placed into the bitcoind.dbg file. However because they are compiled into bitcoind initially, the debugging symbols have an effect on the build id that gcc embeds into the binary. The build ID is a hash of the compiled binary, including the debugging data.

So the end result is that the published binary contains two commitments to the debug symbols, but does not actually contain them itself.

You can read more about these separate debug files here: https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Files.html

The crux of this issue is that the debug symbols from recent gitian builds differ from the debug symbols that were generated for the original release. This means that both the build ID and the debug symbol checksum that we find inside of bitcoind are different. This then results in the bitcoind hashes being different (as well as all of the other binaries). And of course that causes the tarfile hashes to be different.


Here is the diff of the binaries that diffoscope generates:

--- bitcoind
+++ /mnt/archive/bitcoin/bitcoin-binaries/0.20.1/bitcoin-0.20.1/bin/bitcoind
├── readelf --wide --notes {}
│ @@ -1,15 +1,15 @@
│
│ Displaying notes found in: .note.ABI-tag
│ Owner Data size Description
│ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0
│
│ Displaying notes found in: .note.gnu.build-id
│ Owner Data size Description
│ - GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 6b464617f7f91fd270ac86f43ef4a58eeeedff19
│ + GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 3a439a31a5157ff7052ed310050df5643a02ea3f
│
│ Displaying notes found in: .note.stapsdt
│ Owner Data size Description
│ stapsdt 0x00000036 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx
│ Name: throw
│ Location: 0x00000000006e550d, Base: 0x000000000086d140, Semaphore: 0x0000000000000000
│ Arguments: 8@%rdi 8@%rsi
├── readelf --wide --decompress --hex-dump=.gnu_debuglink {}
│ @@ -1,5 +1,5 @@
│
│ Hex dump of section '.gnu_debuglink':
│ 0x00000000 62697463 6f696e64 2e646267 00000000 bitcoind.dbg....
│ - 0x00000010 b25ceebb .\..
│ + 0x00000010 114d519d .MQ.

As you can see, there are only two differences here, one in the build ID, and one in the .gnu_debuglink section. From the documentation I linked earlier, we can see that this .gnu_debuglink section has the first line is the debug filename followed by enough 0 bytes to pad to a 4 byte boundary. The second line is the 4 byte CRC checksum. And it is this CRC checksum that differs.

So why do the debug symbols differ here? Again, diffoscope can help us a bit.

--- bitcoind.dbg
+++ /mnt/archive/bitcoin/bitcoin-binaries/0.20.1/bitcoin-0.20.1/bin/bitcoind.dbg
├── readelf --wide --notes {}
│ @@ -1,15 +1,15 @@
│
│ Displaying notes found in: .note.ABI-tag
│ Owner Data size Description
│ GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 3.2.0
│
│ Displaying notes found in: .note.gnu.build-id
│ Owner Data size Description
│ - GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 6b464617f7f91fd270ac86f43ef4a58eeeedff19
│ + GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: 3a439a31a5157ff7052ed310050df5643a02ea3f
│
│ Displaying notes found in: .note.stapsdt
│ Owner Data size Description
│ stapsdt 0x00000036 NT_STAPSDT (SystemTap probe descriptors) Provider: libstdcxx
│ Name: throw
│ Location: 0x00000000006e550d, Base: 0x000000000086d140, Semaphore: 0x0000000000000000
│ Arguments: 8@%rdi 8@%rsi
├── readelf --wide --debug-dump=info {}
│┄ error from `readelf --wide --debug-dump=info {}`:
│┄ readelf: Error: /build/binutils/src/binutils-gdb/binutils/dwarf.c:1989: read LEB value is too large to store in destination variable
│┄ readelf: Error: /build/binutils/src/binutils-gdb/binutils/dwarf.c:1989: read LEB value is too large to store in destination variable
│┄ readelf: Error: /build/binutils/src/binutils-gdb/binutils/dwarf.c:1989: read LEB value is too large to store in destination variable
│┄ readelf: Error: /build/binutils/src/binutils-gdb/binutils/dwarf.c:1989: read LEB value is too large to store in destination variable
│┄ readelf: Error: /build/binutils/src/binutils-gdb/binutils/dwarf.c:1989: read LEB value is too large to store in destination variable
│ @@ -85052,36 +85052,36 @@
│ <29607> DW_AT_decl_line : 124
│ <29608> DW_AT_decl_column : 16
│ <29609> DW_AT_type : <0x28761>
│ <2960d> DW_AT_data_member_location: 12
│ <2><2960e>: Abbrev Number: 30 (DW_TAG_member)
│ <2960f> DW_AT_name : (indirect string, offset: 0x13aaf): __kind
│ <29613> DW_AT_decl_file : 108
│ - <29614> DW_AT_decl_line : 148
│ + <29614> DW_AT_decl_line : 128
│ <29615> DW_AT_decl_column : 7
│ <29616> DW_AT_type : <0x287a6>
│ <2961a> DW_AT_data_member_location: 16
│ <2><2961b>: Abbrev Number: 30 (DW_TAG_member)
│ <2961c> DW_AT_name : (indirect string, offset: 0x7a535): __spins
│ <29620> DW_AT_decl_file : 108
│ - <29621> DW_AT_decl_line : 154
│ + <29621> DW_AT_decl_line : 134
│ <29622> DW_AT_decl_column : 3
│ <29623> DW_AT_type : <0x2879a>
│ <29627> DW_AT_data_member_location: 20
...

There's a lot more output that I haven't included because it's pretty much all of the same.

Now this isn't terribly helpful, but we can see that for a bunch of functions, the line number for that function differs by 20 lines.

To get some more information, I used dwarfdump. This is what it says for the new build for the two functions I show in diffoscope (the functions are __kind and __spins).

0x0002960e: DW_TAG_member DW_AT_name ("__kind") DW_AT_decl_file ("/usr/include/x86_64-linux-gnu/bits/thread-shared-types.h") DW_AT_decl_line (148) DW_AT_decl_column (0x07) DW_AT_type (0x000287a6 "int") DW_AT_data_member_location (0x10)
0x0002961b: DW_TAG_member DW_AT_name ("__spins") DW_AT_decl_file ("/usr/include/x86_64-linux-gnu/bits/thread-shared-types.h") DW_AT_decl_line (154) DW_AT_decl_column (0x03) DW_AT_type (0x0002879a "short int") DW_AT_data_member_location (0x14)

As you can see by the given file name, these functions come from libraries installed to the system. These appear to be headers for gcc's implementation of the c++ stdlib.


So what's happened is that libstdc++ has updated in Ubuntu. Whatever updates hapened have moved some code in some header files that Bitcoin Core includes in its use of the c++ stdlib. In turn, compiling with those updated headers results in different debug symbols because function declarations have moved in those header files. This then results in gcc computing a different build ID and a different CRC checksum for the debug symbols. This lastly results in the final binaries being slightly different, which causes the hashes to mismatch.


Get BONUS $200 for FREE!

You can get bonuses upto $100 FREE BONUS when you:
💰 Install these recommended apps:
💲 SocialGood - 100% Crypto Back on Everyday Shopping
💲 xPortal - The DeFi For The Next Billion
💲 CryptoTab Browser - Lightweight, fast, and ready to mine!
💰 Register on these recommended exchanges:
🟡 Binance🟡 Bitfinex🟡 Bitmart🟡 Bittrex🟡 Bitget
🟡 CoinEx🟡 Crypto.com🟡 Gate.io🟡 Huobi🟡 Kucoin.



Comments