RSS Atom Add a new post titled:

Following last week's .zed format reverse-engineered specification, Loïc Dachary contributed a POC extractor!
It's available at http://www.dachary.org/loic/zed/, it can list non-encrypted metadata without password, and extract files with password (or .pem file).
Leveraging on python-olefile and pycrypto, only 500 lines of code (test cases excluded) are enough to implement it :)

Posted Tue Sep 19 21:28:14 2017 Tags:

TL,DR: I reverse-engineered the .zed encrypted archive format.
Following a clean-room design, I'm providing a description that can be implemented by a third-party.
Interested? :)

(reference version at: https://www.beuc.net/zed/)

.zed archive file format

Introduction

Archives with the .zed extension are conceptually similar to an encrypted .zip file.

In addition to a specific format, .zed files support multiple users: files are encrypted using the archive master key, which itself is encrypted for each user and/or authentication method (password, RSA key through certificate or PKCS#11 token). Metadata such as filenames is partially encrypted.

.zed archives are used as stand-alone or attached to e-mails with the help of a MS Outlook plugin. A variant, which is not covered here, can encrypt/decrypt MS Windows folders on the fly like ecryptfs.

In the spirit of academic and independent research this document provides a description of the file format and encryption algorithms for this encrypted file archive.

See the conventions section for conventions and acronyms used in this document.

Structure overview

The .zed file format is composed of several layers.

  • The main container is using the (MS-CFB), which is notably used by MS Office 97-2003 .doc files. It contains several streams:

    • Metadata stream: in OLE Property Set format (MS-OLEPS), contains 2 blobs in a specific Type-Length-Value (TLV) format:

      • _ctlfile: global archive properties and access list
        It is obfuscated by means of static-key AES encryption.
        The properties include archive initial filename and a global IV.
        A global encryption key is itself encrypted in each user entry.

      • _catalog: file list
        Contains each file metadata indexed with a 15-bytes identifier.
        Directories are supported.
        Full filename is encrypted using AES.
        File extension is (redundantly) stored in clear, and so are file metadata such as modification time.

    • Each file in the archive compressed with zlib and encrypted with the standard AES algorithm, in a separate stream.
      Several encryption schemes and key sizes are supported.
      The file stream is split in chunks of 512 bytes, individually encrypted.

    • Optional streams, contain additional metadata as well as pictures to display in the application background ("watermarks"). They are not discussed here.

Or as a diagram:

+----------------------------------------------------------------------------------------------------+
| .zed archive (MS-CBF)                                                                              |
|                                                                                                    |
|  stream #1                         stream #2                       stream #3...                    |
| +------------------------------+  +---------------------------+  +---------------------------+     |
| | metadata (MS-OLEPS)          |  | encryption (AES)          |  | encryption (AES)          |     |
| |                              |  | 512-bytes chunks          |  | 512-bytes chunks          |     |
| | +--------------------------+ |  |                           |  |                           |     |
| | | obfuscation (static key) | |  | +-----------------------+ |  | +-----------------------+ |     |
| | | +----------------------+ | |  |-| compression (zlib)    |-|  |-| compression (zlib)    |-|     |
| | | |_ctlfile (TLV)        | | |  | |                       | |  | |                       | | ... |
| | | +----------------------+ | |  | | +---------------+     | |  | | +---------------+     | |     | 
| | +--------------------------+ |  | | | file contents |     | |  | | | file contents |     | |     |
| |                              |  | | |               |     | |  | | |               |     | |     |
| | +--------------------------+ |  |-| +---------------+     |-|  |-| +---------------+     |-|     |
| | | _catalog (TLV)           | |  | |                       | |  | |                       | |     |
| | +--------------------------+ |  | +-----------------------+ |  | +-----------------------+ |     |
| +------------------------------+  +---------------------------+  +---------------------------+     |
+----------------------------------------------------------------------------------------------------+

Encryption schemes

Several AES key sizes are supported, such as 128 and 256 bits.

The Cipher Block Chaining (CBC) block cipher mode of operation is used to decrypt multiple AES 16-byte blocks, which means an initialisation vector (IV) is stored in clear along with the ciphertext.

All filenames and file contents are encrypted using the same encryption mode, key and IV (e.g. if you remove and re-add a file in the archive, the resulting stream will be identical).

No cleartext padding is used during encryption; instead, several end-of-stream handlers are available, so the ciphertext has exactly the size of the cleartext (e.g. the size of the compressed file).

The following variants were identified in the 'encryption_mode' field.

STREAM

This is the end-of-stream handler for:

  • obfuscated metadata encrypted with static AES key
  • filenames and files in archives with 'encryption_mode' set to "AES-CBC-STREAM"
  • any AES ciphertext of size < 16 bytes, regardless of encryption mode

This end-of-stream handler is apparently specific to the .zed format, and applied when the cleartext's does not end on a 16-byte boundary ; in this case special processing is performed on the last partial 16-byte block.

The encryption and decryption phases are identical: let's assume the last partial block of cleartext (for encryption) or ciphertext (for decryption) was appended after all the complete 16-byte blocks of ciphertext:

  • the second-to-last block of the ciphertext is encrypted in AES-ECB mode (i.e. block cipher encryption only, without XORing with the IV)

  • then XOR-ed with the last partial block (hence truncated to the length of the partial block)

In either case, if the full ciphertext is less then one AES block (< 16 bytes), then the IV is used instead of the second-to-last block.

CTS

CTS or CipherText Stealing is the end-of-stream handler for:

  • filenames and files in archives with 'encryption_mode' set to "AES-CBC-CTS".
    • exception: if the size of the ciphertext is < 16 bytes, then "STREAM" is used instead.

It matches the CBC-CS3 variant as described in Recommendation for Block Cipher Modes of Operation: Three Variants of Ciphertext Stealing for CBC Mode.

Empty cleartext

Since empty filenames or metadata are invalid, and since all files are compressed (resulting in a minimum 8-byte zlib cleartext), no empty cleartext was encrypted in the archive.

metadata stream

It is named 05356861616161716149656b7a6565636e576a33317a7868304e63 (hexadecimal), i.e. the character with code 5 followed by '5haaaaqaIekzeecnWj31zxh0Nc' (ASCII).

The format used is OLE Property Set (MS-OLEPS).

It introduces 2 property names "_ctlfile" (index 3) and "_catalog" (index 4), and 2 instances of said properties each containing an application-specific VT_BLOB (type 0x0041).

_ctlfile: obfuscated global properties and access list

This subpart is stored under index 3 ("_ctlfile") of the MS-OLEPS metadata.

It consists of:

  • static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by 0100 (hexadecimal) (18 bytes total)
  • 16-byte IV
  • ciphertext
  • 1 uint32be representing the length of all the above
  • static delimiter 0765921A2A0774534752073361719300 (hexadecimal) followed by "ZoneCentral (R)" (ASCII) and a NUL byte (32 bytes total)

The ciphertext is encrypted with AES-CBC "STREAM" mode using 128-bit static key 37F13CF81C780AF26B6A52654F794AEF (hexadecimal) and the prepended IV so as to obfuscate the access list. The ciphertext is continuous and not split in chunks (unlike files), even when it is larger than 512 bytes.

The decrypted text contain properties in a TLV format as described in _ctlfile TLV:

  • global archive properties as a 'fileprops' structure,

  • extra archive properties as a 'archive_extraprops' structure

  • users access list as a series of 'passworduser' and 'rsauser entries.

Archives may include "mandatory" users that cannot be removed. They are typically used to add an enterprise wide recovery RSA key to all archives. Extreme care must be taken to protect these key, as it can decrypt all past archives generated from within that company.

_catalog: file list

This subpart is stored under index 4 ("_catalog") of the MS-OLEPS metadata.

It contains a series of 'fileprops' TLV structures, one for each file or directory.

The file hierarchy can be reconstructed by checking the 'parent_id' field of each file entry. If 'parent_id' is 0 then the file is located at the top-level of the hierarchy, otherwise it's located under the directory with the matching 'file_id'.

TLV format

This format is a series of fields :

  • 4 bytes for Type (specified as a 4-bytes hexadecimal below)
  • 4 bytes for value Length (uint32be)
  • Value

Value semantics depend on its Type. It may contain an uint32be integer, a UTF-16LE string, a character sequence, or an inner TLV structure.

Unless otherwise noted, TLV structures appear once.

Some fields are optional and may not be present at all (e.g. 'archive_createdwith').

Some fields are unique within a structure (e.g. 'files_iv'), other may be repeated within a structure to form a list (e.g. 'fileprops' and 'passworduser').

The following top-level types that have been identified, and detailed in the next sections:

  • 80110600: fileprops, used for the file list as well as for the global archive properties
  • 001b0600: archive_extraprops
  • 80140600: accesslist

Some additional unidentified types may be present.

_ctlfile TLV

  • 80110600: fileprops (TLV structure): global archive properties
    • 00230400: archive_pathname (UTF-16LE string): initial archive filename (past versions also leaked the full pathname of the initial archive)
    • 80270200: encryption_mode (utf32be): 103 for "AES-CBC-STREAM", 104 for "AES-CBC-CTS"
    • 80260200: encryption_strength (utf32be): AES key size, in bytes (e.g. 32 means AES with a 256-bit key)
    • 80280500: files_iv (sequence of bytes): global IV for all filenames and file contents
  • 001b0600: archive_extraprops (TLV structure): additionnal archive properties (optional)
    • 00c40500: archive_creationtime (FILETIME): date and time when archive was initially created (optional)
    • 00c00400: archive_createdwith (UTF-16LE string): uuid-like structure describing the application that initialized the archive (optional)
      {00000188-1000-3CA8-8868-36F59DEFD14D} is Zed! Free 1.0.188.
  • 80140600: accesslist (TLV structure): describe the users, their key encryption and their permissions
    • 80610600: passworduser (TLV structure): user identified by password (0 or more)
    • 80620600: rsauser (TLV structure): user identified by RSA key (via file or PKCS#11 token) (0 or more)
    • Fields common to passworduser and rsauser:
      • 80710400: login (UTF-16LE string): user name
      • 80720300: login_md5 (sequence of bytes): used by the application to search for a user name
      • 807e0100: priv1 (uchar): user privileges; present and set to 1 when user is admin (optional)
      • 00830200: priv2 (uint32be): user privileges; present and set to 2 when user is admin, present and set to 5 when user is a marked as mandatory, e.g. for recovery keys (optional)
      • 80740500: files_key_ciphertext (sequence of bytes): the archive encryption key, itself encrypted
      • 00840500: user_creationtime (FILETIME): date and time when the user was added to the archive
    • passworduser-specific fields:
      • 80760500: pbe_salt (sequence of bytes): salt for PBE
      • 80770200: pbe_iter (uint32be): number of iterations for PBE
      • 80780200: pkcs12_hashfunc (uint32be): hash function used for PBE and PBA key derivation
      • 80790500: pba_checksum (sequence of bytes): password derived with PBA to check for password validity
      • 807a0500: pba_salt (sequence of bytes): salt for PBA
      • 807b0200: pba_iter (uint32be): number of iterations for PBA
    • rsauser-specific fields:
      • 807d0500: certificate (sequence of bytes): user X509 certificate in DER format

_catalog TLV

  • 80110600: fileprops (TLV structure): describe the archive files (0 or more)
    • 80300500: file_id (sequence of bytes): a 16-byte unique identifier
    • 80310400: filename_halfanon (UTF-16LE string): half-anonymized filename, e.g. File1.txt (leaking filename extension)
    • 00380500: filename_ciphertext (sequence of bytes): encrypted filename; may have a trailing NUL byte once decrypted
    • 80330500: file_size (uint64le): decompressed file size in bytes
    • 80340500: file_creationtime (FILETIME): file creation date and time
    • 80350500: file_lastwritetime (FILETIME): file last modification date and time
    • 80360500: file_lastaccesstime (FILETIME): file last access date and time
    • 00370500: parent_directory_id (sequence of bytes): file_id of the parent directory, 0 is top-level
    • 80320100: is_dir (uint32be): 1 if entry is directory (optional)

Decrypting the archive AES key

rsauser

The user accessing the archive will be authenticated by comparing his/her X509 certificate with the one stored in the 'certificate' field using DER format.

The 'files_key_ciphertext' field is then decrypted using the PKCS#1 v1.5 encryption mechanism, with the private key that matches the user certificate.

passworduser

An intermediary user key, a user IV and an integrity checksum will be derived from the user password, using the deprecated PKCS#12 method as described at rfc7292 appendix B.

Note: this is not PKCS#5 (nor PBKDF1/PBKDF2), this is an incompatible method from PKCS#12 that notably does not use HMAC.

The 'pkcs12_hashfunc' field defines the underlying hash function. The following values have been identified:

  • 21: SHA-1
  • 22: SHA-256

PBA - Password-based authentication

The user accessing the archive will be authenticated by deriving an 8-byte sequence from his/her password.

The parameters for the derivation function are:

  • ID: 3
  • 'pba_salt': the salt, typically an 8-byte random sequence
  • 'pba_iter': the iteration count, typically 200000

The derivation is checked against 'pba_checksum'.

PBE - Password-based encryption

Once the user is identified, 2 new values are derived from the password with different parameters to produce the IV and the key decryption key, with the same hash function:

  • 'pbe_salt': the salt, typically an 8-bytes random sequence
  • 'pbe_iter': the iteration count, typically 100000

The parameters specific to user key are:

  • ID: 1
  • size: 32

The user key needs to be truncated to a length of 'encryption_strength', as specified in bytes in the archive properties.

The parameters specific to user IV are:

  • ID: 2
  • size: 16

Once the key decryption key and the IV are derived, 'files_key_ciphertext' is decrypted using AES CBC, with PKCS#7 padding.

Identifying file streams

The name of the MS-CFB stream is derived by shuffling the bytes from the 'file_id' field and then encoding the result as hexadecimal.

The reordering is:

Initial  offset: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Shuffled offset: 3 2 1 0 5 4 7 6 8 9 10 11 12 13 14 15

The 16th byte is usually a NUL byte, hence the stream identifier is a 30-character-long string.

Decrypting files

The compressed stream is split in chunks of 512 bytes, each of them encrypted separately using AES CBS and the global archive encryption scheme. Decryption uses the global AES key (retrieved using the user credentials), and the global IV (retrieved from the deobfuscated archive metadata).

The IV for each chunk is computed by:

  • expressing the current chunk number as little endian on 16 bytes
  • XORing it with the global IV
  • encrypting with the global AES key in ECB mode (without IV).

Each chunk is an independent stream and the decryption process involves end-of-stream handling even if this is not the end of the actual file. This is particularly important for the CTS handler.

Note: this is not to be confused with CTR block cipher mode of operation with operates differently and requires a nonce.

Decompressing files

Compressed streams are zlib stream with default compression options and can be decompressed following the zlib format.

Test cases

Excluded for brevity, cf. https://www.beuc.net/zed/#test-cases.

Conventions and references

Feedback

Feel free to send comments at beuc@beuc.net. If you have .zed files that you think are not covered by this document, please send them as well (replace sensitive files with other ones). The author's GPG key can be found at 8FF1CB6E8D89059F.

Copyright (C) 2017 Sylvain Beucler

Copying and distribution of this file, with or without modification, are permitted in any medium without royalty provided the copyright notice and this notice are preserved. This file is offered as-is, without any warranty.

Posted Sun Sep 10 15:18:30 2017 Tags:

On my quest to generate reproducible standalone binaries for GNU FreeDink, I met new friends but currently lie defeated by an unexpected enemy...

Episode 1:

  • compiler version needs to be identical and recorded
  • build options and their order need to be identical and recorder
  • build path needs to be identical and recorded (otherwise debug symbols - and BuildIDs - change)
  • diffoscope helps checking for differences in build output

Episode 2:

  • use -Wl,--no-insert-timestamp for .exe (with old binutils 2.25 caveat)
  • no need to set a build path for stripped .exe (no ELF BuildID)
  • reprotest helps checking build variations automatically
  • MXE stack is apparently deterministic enough for a reproducible static build
  • umask needs to be identical and recorded
  • file timestamps needs to be set and recorded (more on this in a future episode)

First, the random build differences when using -Wl,--no-insert-timestamp were explained.
peanalysis shows random build dates:

$ reprotest 'i686-w64-mingw32.static-gcc hello.c -I /opt/mxe/usr/i686-w64-mingw32.static/include -I/opt/mxe/usr/i686-w64-mingw32.static/include/SDL2 -L/opt/mxe/usr/i686-w64-mingw32.static/lib -lmingw32 -Dmain=SDL_main -lSDL2main -lSDL2 -lSDL2main -Wl,--no-insert-timestamp -luser32 -lgdi32 -lwinmm -limm32 -lole32 -loleaut32 -lshell32 -lversion -o hello && chmod 700 hello && analysePE.py hello | tee /tmp/hello.log-$(date +%s); sleep 1' 'hello'
$ diff -au /tmp/hello.log-1*
--- /tmp/hello.log-1490950327   2017-03-31 10:52:07.788616930 +0200
+++ /tmp/hello.log-1523203509   2017-03-31 10:52:09.064633539 +0200
@@ -18,7 +18,7 @@
 found PE header (size: 20)
     machine: i386
     number of sections: 17
-    timedatestamp: -1198218512 (Tue Jan 12 05:31:28 1932)
+    timedatestamp: 632430928 (Tue Jan 16 09:15:28 1990)
     pointer to symbol table: 4593152 (0x461600)
     number of symbols: 11581 (0x2d3d)
     size of optional header: 224
@@ -47,7 +47,7 @@
     Win32VersionValue: 0
     size of image (memory): 4640768
     size of headers (offset to first section raw data): 1536
-    checksum (for drivers): 4927867
+    checksum (for drivers): 4922616
     subsystem: 3
         win32 console binary
     DllCharacteristics: 0

Stephen Kitt mentioned 2 simple patches (1 2) fixing uninitialized memory in binutils.

These patches fix the variation and were submitted to MXE (pull request).


Next was playing with compiler support for SOURCE_DATE_EPOCH (which e.g. sets __DATE__ macros).
The FreeDink DFArc frontend historically displays a build date in the About box:

    "Build Date: %s\n", ..., __TDATE__

sadly support is only landing upstream in GCC 7 :/
I had to remove that date.


Now comes the challenging parts.

All my tests with reprotest checked. I started writing a reproducible build environment based on Docker (git browse).
At first I could not run reprotest in the container, so I reworked it with SSH support, and reprotest validated determinism.
(I also generate a reproducible .zip archive, more on that later.)

So far so good, but were the release identical when running reprotest successively on the different environments?
(reminder: this is a .exe build that is insensitive to varying path, hence consistent in a full reprotest)

$ sha256sum *.zip
189d0ca5240374896c6ecc6dfcca00905ae60797ab48abce2162fa36568e7cf1  freedink-109.0-bin-buildsh.zip
e182406b4f4d7c3a4d239eee126134ba5c0304bbaa4af3de15fd4f8bda5634a9  freedink-109.0-bin-docker.zip
e182406b4f4d7c3a4d239eee126134ba5c0304bbaa4af3de15fd4f8bda5634a9  freedink-109.0-bin-reprotest-docker.zip
37007f6ee043d9479d8c48ea0a861ae1d79fb234cd05920a25bb3db704828ece  freedink-109.0-bin-reprotest-null.zip

Ouch! Even though both the Docker and my host are running Stretch, there are differences.


For the two host builds (direct and reprotest), there is a subtle but simple difference: HOME.
HOME is invariably non-existant in reprotest, while my normal compilation environment has an existing home (duh!).

This caused a subtle bug when cross-compiling with mingw and wine-binfmt:

  • existing home: ./configure attempts to run conftest.exe, wine can create ~/.wine, conftest.exe runs with binfmt emulation, configure assumes:
  checking whether we are cross compiling... no
  • non-existing home: ./configure attempts to run conftest.exe, wine can't create ~/.wine, conftest.exe fails, configure assumes:
  checking whether we are cross compiling... yes

The respective binaries were very different notably due to a different config.h.
This can be fixed by specifying --build in addition to --host when calling ./configure.

I suggested reprotest have one of the tests with a valid HOME (#860428).


Now comes the big one, after the fix I still got:

$ sha256sum *.zip
3545270ef6eaa997640cb62d66dc610a328ce0e7d412f24c8f18fdc7445907fd  freedink-109.0-bin-buildsh.zip
cc50ec1a38598d143650bdff66904438f0f5c1d3e2bea0219b749be2dcd2c3eb  freedink-109.0-bin-docker.zip
3545270ef6eaa997640cb62d66dc610a328ce0e7d412f24c8f18fdc7445907fd  freedink-109.0-bin-reprotest-chroot.zip
cc50ec1a38598d143650bdff66904438f0f5c1d3e2bea0219b749be2dcd2c3eb  freedink-109.0-bin-reprotest-docker.zip
3545270ef6eaa997640cb62d66dc610a328ce0e7d412f24c8f18fdc7445907fd  freedink-109.0-bin-reprotest-null.zip

There is consistency on my host, and consistency within docker, but both are different.
Moreover, all the .o files were identical, so something must have gone wrong when compiling the libs, that is MXE.

After many checks it appears that libstdc++.a is different.
Just overwriting it gets me a consistent FreeDink release on all environments.
Still, when rebuilding it (make gcc), libstdc++.a always has the same environment-dependent checksum.

45f8c5d50a68aa9919ee3602a4e3f5b2bd0333bc8d781d7852b2b6121c8ba27b  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a  # host
6870b84f8e17aec4b5cf23cfe9c2e87e40d9cf59772a92707152361b6ebc1eb4  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a  # docker

The 2 libraries are much different, there's barely any blue in hexcompare.

At that time, I realized that the Docker "official" Debian are not really official, as Joey explains.
Could it be that Docker maliciously tampered with the compiler as Ken Thompson warned in 1984??

Well before jumping to conclusion let's mix & match.

  • First I rsync a copy of my Docker filesystem and run it in a host chroot with a reset environment.
$ sudo env -i /usr/sbin/chroot chroot-docker/
$ exec bash -l
$ cd /opt/mxe
$ touch src/gcc.mk
$ sha256sum /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a 
6870b84f8e17aec4b5cf23cfe9c2e87e40d9cf59772a92707152361b6ebc1eb4  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
$ make gcc
[build]     gcc                    i686-w64-mingw32.static
[done]      gcc                    i686-w64-mingw32.static                                 2709464 KiB    7m2.039s
$ sha256sum /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a 
45f8c5d50a68aa9919ee3602a4e3f5b2bd0333bc8d781d7852b2b6121c8ba27b  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
# consistent with host builds
  • Then I import my previous reprotest chroot (plain debootstrap) in Docker:
$ sudo tar -C chroot -c . | docker import - chroot-debootstrap
$ docker run -ti chroot-debootstrap /bin/bash
$ sha256sum /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
45f8c5d50a68aa9919ee3602a4e3f5b2bd0333bc8d781d7852b2b6121c8ba27b  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
$ touch src/gcc.mk
$ make gcc
[build]     gcc                    i686-w64-mingw32.static
[done]      gcc                    i686-w64-mingw32.static                                 2709412 KiB    7m6.608s
$ sha256sum /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
6870b84f8e17aec4b5cf23cfe9c2e87e40d9cf59772a92707152361b6ebc1eb4  /opt/mxe/usr/lib/gcc/i686-w64-mingw32.static/5.4.0/libstdc++.a
# consistent with docker builds

So, AFAICS when building with:

  • exactly the same kernel
  • exactly the same GCC sources
  • exactly the same host binaries

then depending on whether running in a container or not we get a consistent but different libstdc++.a.

This kind of issue is not detected with a simple reprotest build, as it only tests variations within a fixed build environment.
This is quite worrisome, I intend to use a container to control my build environment, but I can't guarantee that the container technology will be exactly the same 5 years from now.

All my setup is simple and available for inspection at https://git.savannah.gnu.org/cgit/freedink.git/tree/autobuild/freedink-w32-snapshot/.

I'd very much welcome enlightenment :)

Posted Mon Apr 17 15:33:45 2017 Tags:

Let's review what we learned so far:

  • compiler version need to be identical and recorded
  • build options and their order needs to be identical and recorder
  • build path needs to be identical and recorded
    (otherwise debug symbols - and BuildIDs - change)
  • diffoscope helps checking for differences in build output

We stopped when compiling a PE .exe produced a varying output.
It turns out that PE carries a build date timestamp.

The spec says that bound DLLs timestamps are refered to in the "Delay-Load Directory Table". Maybe that's also the date Windows displays when a system-wide DLL is about to be replaced, too.
Build timestamps looks unused in .exe files though.

Anyway, Stephen Kitt pointed out (thanks!) that Debian's MinGW linker binutils-mingw-w64 has an upstream-pending patch that sets the timestamp to SOURCE_DATE_EPOCH if set.

Alternatively, one can pass -Wl,--no-insert-timestamp to set it to 0 (though see caveats below):

$ i686-w64-mingw32.static-gcc -Wl,--no-insert-timestamp hello.c -o hello.exe 
$ md5sum hello.exe 
298f98d74e6e913628a8b74514eddcb2  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc -Wl,--no-insert-timestamp hello.c -o hello.exe 
$ md5sum hello.exe 
298f98d74e6e913628a8b74514eddcb2  hello.exe

If we don't care about debug symbols, unlike with ELF, stripped PE binaries look stable too!

$ cd repro/
$ i686-w64-mingw32.static-gcc hello.c -o hello.exe && i686-w64-mingw32.static-strip hello.exe
$ md5sum hello.exe 
6e07736bf8a59e5397c16e799699168d  hello.exe
$ i686-w64-mingw32.static-gcc hello.c -o hello.exe && i686-w64-mingw32.static-strip hello.exe
$ md5sum hello.exe 
6e07736bf8a59e5397c16e799699168d  hello.exe
$ cd ..
$ cp -a repro repro2/
$ cd repro2/
$ i686-w64-mingw32.static-gcc hello.c -o hello.exe && i686-w64-mingw32.static-strip hello.exe
$ md5sum hello.exe 
6e07736bf8a59e5397c16e799699168d  hello.exe

Now that we have the main executable covered, what about the dependencies?
Let's see how well MXE compiles SDL2:

$ cd /opt/mxe/
$ cp -a ./usr/i686-w64-mingw32.static/lib/libSDL2.a /tmp
$ rm -rf * && git checkout .
$ make sdl2
$ md5sum ./usr/i686-w64-mingw32.static/lib/libSDL2.a /tmp/libSDL2.a 
68909ab13181b1283bd1970a56d41482  ./usr/i686-w64-mingw32.static/lib/libSDL2.a
68909ab13181b1283bd1970a56d41482  /tmp/libSDL2.a

Neat - what about another build directory?

$ cd /usr/srx/mxe
$ make sdl2
$ md5sum usr/i686-w64-mingw32.static/lib/libSDL2.a /tmp/libSDL2.a 
c6c368323927e2ae7adab7ee2a7223e9  usr/i686-w64-mingw32.static/lib/libSDL2.a
68909ab13181b1283bd1970a56d41482  /tmp/libSDL2.a
$ ls -l ./usr/i686-w64-mingw32.static/lib/libSDL2.a /tmp/libSDL2.a 
-rw-r--r-- 1 me me 5861536 mars  23 21:04 /tmp/libSDL2.a
-rw-r--r-- 1 me me 5862488 mars  25 19:46 ./usr/i686-w64-mingw32.static/lib/libSDL2.a

Well that was expected.
But what about the filesystem order?
With such an automated build, could potential variations in the order of files go undetected?
Would the output be different on another filesystem format (ext4 vs. btrfs...)?

It was a good opportunity to test the disorderfs fuse-based tool.
And while I'm at it, check if reprotest is easy enough to use (the manpage is scary).
Let's redo our basic tests with it - basic usage is actually very simple:

$ apt-get install reprotest disorderfs faketime
$ reprotest 'make hello' 'hello'
...
will vary: environment
will vary: fileordering
will vary: home
will vary: kernel
will vary: locales
will vary: exec_path
will vary: time
will vary: timezone
will vary: umask
...
--- /tmp/tmpk5uipdle/control_artifact/
+++ /tmp/tmpk5uipdle/experiment_artifact/--- /tmp/tmpk5uipdle/control_artifact/hello
├── +++ /tmp/tmpk5uipdle/experiment_artifact/hello
├── stat {}
│ │ @@ -1,8 +1,8 @@
│ │  
│ │    Size: 8632       Blocks: 24         IO Block: 4096   regular file
│ │  Links: 1
│ │ -Access: (0755/-rwxr-xr-x)  Uid: ( 1000/      me)   Gid: ( 1000/      me)
│ │ +Access: (0775/-rwxrwxr-x)  Uid: ( 1000/      me)   Gid: ( 1000/      me)
│ │  
│ │  Modify: 1970-01-01 00:00:00.000000000 +0000
│ │  
│ │   Birth: -
# => OK except for permissions

$ reprotest 'make hello && chmod 755 hello' 'hello'
=======================
Reproduction successful
=======================
No differences in hello
c8f63b73265e69ab3b9d44dcee0ef1d2815cdf71df3c59635a2770e21cf462ec  hello

$ reprotest 'make hello CFLAGS="-g -O2"' 'hello'
# => lots of differences, as expected

Now let's apply to the MXE build.
We keep the same build path, and also avoid using linux32 (because MXE would then recompile all the host compiler tools for 32-bit):

$ reprotest --dont-vary build_path,kernel 'touch src/sdl2.mk && make sdl2 && cp -a usr/i686-w64-mingw32.static/lib/libSDL2.a .' 'libSDL2.a'
=======================
Reproduction successful
=======================
No differences in libSDL2.a
d9a39785fbeee5a3ac278be489ac7bf3b99b5f1f7f3e27ebf3f8c60fe25086b5  libSDL2.a

That checks!
What about a full MXE environment?

$ reprotest --dont-vary build_path,kernel 'make clean && make sdl2 sdl2_gfx sdl2_image sdl2_mixer sdl2_ttf libzip gettext nsis' 'usr'
# => changes in installation dates
# => timestamps in .exe files (dbus, ...)
# => libicu doesn't look reproducible (derb.exe, genbrk.exe, genccode.exe...)
# => apparently ar timestamp variations in libaclui

Most libraries look reproducible enough.
ar differences may go away at FreeDink link time since I'm aiming at a static build. Let's try!

First let's see how FreeDink behaves with stable dependencies.
We can compile with -Wl,--no-insert-timestamp and strip the binaries in a first step.
There are various issues (timestamps, permissions) but first let's check the executables themselves:

$ cd freedink/
$ reprotest --dont-vary build_path 'mkdir cross-woe-32/ && cd cross-woe-32/ && export PATH=/opt/mxe/usr/bin:$PATH && LDFLAGS='-Wl,--no-insert-timestamp' ../configure --host=i686-w64-mingw32.static --enable-static && make -j$(nproc) && make install-strip DESTDIR=$(pwd)/destdir' 'cross-woe-32/destdir/usr/local/bin'
# => executables are identical!

# Same again, just to make sure
$ reprotest --dont-vary build_path 'mkdir cross-woe-32/ && cd cross-woe-32/ && export PATH=/opt/mxe/usr/bin:$PATH && LDFLAGS='-Wl,--no-insert-timestamp' ../configure --host=i686-w64-mingw32.static --enable-static && make -j$(nproc) && make install-strip DESTDIR=$(pwd)/destdir' 'cross-woe-32/destdir/usr/local/bin'--- /tmp/tmp2yw0sn4_/control_artifact/bin/freedink.exe
├── +++ /tmp/tmp2yw0sn4_/experiment_artifact/bin/freedink.exe
│ │ @@ -2,20 +2,20 @@
│ │  00000010: b800 0000 0000 0000 4000 0000 0000 0000  ........@.......
│ │  00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
│ │  00000030: 0000 0000 0000 0000 0000 0000 8000 0000  ................
│ │  00000040: 0e1f ba0e 00b4 09cd 21b8 014c cd21 5468  ........!..L.!Th
│ │  00000050: 6973 2070 726f 6772 616d 2063 616e 6e6f  is program canno
│ │  00000060: 7420 6265 2072 756e 2069 6e20 444f 5320  t be run in DOS 
│ │  00000070: 6d6f 6465 2e0d 0d0a 2400 0000 0000 0000  mode....$.......
│ │ -00000080: 5045 0000 4c01 0a00 e534 0735 0000 0000  PE..L....4.5....
│ │ +00000080: 5045 0000 4c01 0a00 0000 0000 0000 0000  PE..L...........
│ │  00000090: 0000 0000 e000 0e03 0b01 0219 00f2 3400  ..............4.
│ │  000000a0: 0022 4e00 0050 3b00 c014 0000 0010 0000  ."N..P;.........
│ │  000000b0: 0010 3500 0000 4000 0010 0000 0002 0000  ..5...@.........
│ │  000000c0: 0400 0000 0100 0000 0400 0000 0000 0000  ................
│ │ -000000d0: 00e0 8900 0004 0000 7662 4e00 0200 0000  ........vbN.....
│ │ +000000d0: 00e0 8900 0004 0000 89f8 4e00 0200 0000  ..........N.....
│ │  000000e0: 0000 2000 0010 0000 0000 1000 0010 0000  .. .............
│ │  000000f0: 0000 0000 1000 0000 00a0 8700 b552 0000  .............R..
│ │  00000100: 0000 8800 d02d 0000 0050 8800 5006 0000  .....-...P..P...
│ │  00000110: 0000 0000 0000 0000 0000 0000 0000 0000  ................
│ │  00000120: 0060 8800 4477 0100 0000 0000 0000 0000  .`..Dw..........
│ │  00000130: 0000 0000 0000 0000 0000 0000 0000 0000  ................
│ │  00000140: 0440 8800 1800 0000 0000 0000 0000 0000  .@..............
├── stat {}
│ │ │ @@ -1,8 +1,8 @@
│ │ │  
│ │ │    Size: 5121536       Blocks: 10008      IO Block: 4096   regular file
│ │ │  Links: 1
│ │ │  Access: (0755/-rwxr-xr-x)  Uid: ( 1000/      me)   Gid: ( 1000/      me)
│ │ │  
│ │ │ -Modify: 2017-03-26 01:26:35.233841833 +0000
│ │ │ +Modify: 2017-03-26 01:27:01.829592505 +0000
│ │ │  
│ │ │   Birth: -

Gah...
AFAIU there is something random in the linking phase, and sometimes the timestamp is removed, sometimes it's not.
Not very easy to track but I believe I reproduced it with the "hello" example:

# With MXE:
$ reprotest 'i686-w64-mingw32.static-gcc hello.c -I /opt/mxe/usr/i686-w64-mingw32.static/include -I/opt/mxe/usr/i686-w64-mingw32.static/include/SDL2 -L/opt/mxe/usr/i686-w64-mingw32.static/lib -lmingw32 -Dmain=SDL_main -lSDL2main -lSDL2 -lSDL2main -Wl,--no-insert-timestamp -luser32 -lgdi32 -lwinmm -limm32 -lole32 -loleaut32 -lshell32 -lversion -o hello && chmod 700 hello' 'hello'
# => different
# => maybe because it imports the build timestamp from -lSDL2main

# With Debian's MinGW (but without SOURCE_DATE_EPOCH):
$ reprotest 'i686-w64-mingw32-gcc hello.c -I /opt/mxe/usr/i686-w64-mingw32.static/include -I/opt/mxe/usr/i686-w64-mingw32.static/include/SDL2 -L/opt/mxe/usr/i686-w64-mingw32.static/lib -lmingw32 -Dmain=SDL_main -lSDL2main -lSDL2 -lSDL2main -Wl,--no-insert-timestamp -luser32 -lgdi32 -lwinmm -limm32 -lole32 -loleaut32 -lshell32 -lversion -o hello && chmod 700 hello' 'hello'
=======================
Reproduction successful
=======================
No differences in hello
0b2d99dc51e2ad68ad040d90405ed953a006c6e58599beb304f0c2164c7b83a2  hello

# Let's remove -Dmain=SDL_main and let our main() have precedence over the one in -lSDL2main:
$ reprotest 'i686-w64-mingw32.static-gcc hello.c -I /opt/mxe/usr/i686-w64-mingw32.static/include -I/opt/mxe/usr/i686-w64-mingw32.static/include/SDL2 -L/opt/mxe/usr/i686-w64-mingw32.static/lib -lmingw32 -lSDL2main -lSDL2 -lSDL2main -Wl,--no-insert-timestamp -luser32 -lgdi32 -lwinmm -limm32 -lole32 -loleaut32 -lshell32 -lversion -o hello && chmod 700 hello' 'hello'
=======================
Reproduction successful
=======================
No differences in hello
6c05f75eec1904d58be222cc83055d078b4c3be8b7f185c7d3a08b9a83a2ef8d  hello

$ LANG=C i686-w64-mingw32.static-ld --version  # MXE
GNU ld (GNU Binutils) 2.25.1
Copyright (C) 2014 Free Software Foundation, Inc.
$ LANG=C i686-w64-mingw32-ld --version  # Debian
GNU ld (GNU Binutils) 2.27.90.20161231
Copyright (C) 2016 Free Software Foundation, Inc.

It looks like there is a random behavior in binutils 2.25, coupled with SDL2's wrapping of my main().

So FreeDink is nearly reproducible, except for this build timestamp issue that pops up in all kind of situations. In the worse case I can zero it out, or patch MXE's binutils until they upgrade.

More importantly, what if I recompile FreeDink and the dependencies twice?

$ (cd /opt/mxe/ && make clean && make sdl2 sdl2_gfx sdl2_image sdl2_mixer sdl2_ttf glm libzip gettext nsis)
$ (mkdir cross-woe-32/ && cd cross-woe-32/ \
  && export PATH=/opt/mxe/usr/bin:$PATH \
  && LDFLAGS="-Wl,--no-insert-timestamp" ../configure --host=i686-w64-mingw32.static --enable-static \
  && make V=1 -j$(nproc) \
  && make install-strip DESTDIR=$(pwd)/destdir)
$ mv cross-woe-32/ cross-woe-32-1/

# Same again...
$ mv cross-woe-32/ cross-woe-32-2/

$ diff -ru cross-woe-32-1/destdir/ cross-woe-32-2/destdir/
[nothing]

Yay!
I could not reproduce the build timestamp issue in the stripped binaries, though it was still varying in the unstripped src/freedinkedit.exe.


I mentioned there was other changes noticed by diffoscope.

  • Changes in file timestamps.

That one is interesting.
Could be ignored, but we want to generate an identical binary package/archive too, right?
That's where archive meta-data matters.
make INSTALL="$(which install) install -p" could help for static files, but not generated ones.
The doc suggests clamping all files to SOURCE_DATE_EPOCH - i.e. all generated files will have their date set at that timestamp:

$ export SOURCE_DATE_EPOCH=$(date +%s) \
  && reprotest --dont-vary build_path \
  'make ... && find destdir/ -newermt "@${SOURCE_DATE_EPOCH}" -print0 | xargs -0r touch --no-dereference --date="@${SOURCE_DATE_EPOCH}"' 'cross-woe-32/destdir/'
  • Changes in directory permissions

Caused by varying umask.
I attempted to mitigate the issue by playing with make install MKDIR_P="mkdir -p -m 755" (1).
However even mkdir -p -m ... does not set permissions for intermediate directories.
Maybe it's better to set and record the umask...


So, aside from minor issues such as BuildIDs and build timestamps, the toolchain is pretty stable as of now.
The issue is more about fixing and recording the build environment.
Which is probably the next challenge :)

Posted Tue Mar 28 17:22:07 2017 Tags:

As GNU FreeDink upstream, I'd very much like to offer pre-built binaries: one (1) official, tested, current, distro-agnostic version of the game with its dependencies.
I'm actually already doing that for the Windows version.
One issue though: people have to trust me -- and my computer's integrity.
Reproducible builds could address that.
My release process is tightly controlled, but is my project reproducible? If not, what do I need? Let's check!

I quickly see that documentation is getting better, namely https://reproducible-builds.org/ :)
(The first docs I read on reproducibility looked more like a crazed date-o-phobic rant than actual solution - plus now we have SOURCE_DATE_EPOCH implemented in gcc ;))

However I was left unsatisfied by the very high-level viewpoint and the lack of concrete examples.
The document points to various issues but is very vague about what tools are impacted.

So let's do some tests!


Let's start with a trivial program:

$ cat > hello.c
#include <stdio.h>
int main(void) {
    printf("Hello, world!\n");
}

OK, first does GCC compile this reproducibly?
I'm not sure because I heard of randomness in identifiers and such in the compilation process...

$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5
$ gcc-5 hello.c -o hello-5
$ md5sum hello-5
a00416d7392442321bad4afc5a461321  hello-5

Cool, ELF compiler output is stable through time!
Now do 2 versions of GCC compile a hello world identically?

$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
$ hexcompare hello-5 hello-6
[lots of red]
...

Well let's not get our hopes too high ;)
Trivial build options change?

$ gcc-6 hello.c -lc -o hello-6
$ gcc-6 -lc hello.c -o hello-6b
$ md5sum hello-6 hello-6b
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6
f73ee6d8c3789fd8f899f5762025420e  hello-6b
$ hexcompare hello-6 hello-6b
[lots of red]
...

OK, let's be very careful with build options then. What about 2 different build paths?

$ cd ..
$ cp -a repro/ repro2/
$ cd repro2/
$ gcc-6 hello.c -o hello-6
$ md5sum hello-6
f7f52c2f5f82fe2a95061a771a6c5acd  hello-6

Basic compilation is stable across directories.
Now I tried recompiling identically FreeDink on 2 different git clones.
Disappointment:

$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
839ccd9180c72343e23e5d9e2e65e237  freedink/native/src/freedink
6d5dc6aab321fab01b424ac44c568dcf  freedink2/native/src/freedink
$ hexcompare freedink2/native/src/freedink freedink/native/src/freedink
[lots of red]

Hmm, what about stripped versions?

$ strip freedink/native/src/freedink freedink2/native/src/freedink
$ md5sum freedink/native/src/freedink freedink2/native/src/freedink
415e96bb54456f3f2a759f404f18c711  freedink/native/src/freedink
e0702d798807c83d21f728106c9261ad  freedink2/native/src/freedink
$ hexcompare freedink/native/src/freedink freedink2/native/src/freedink
[1 single red spot]

OK, what's happening? diffoscope to the rescue:

$ diffoscope freedink/native/src/freedink freedink2/native/src/freedink
--- freedink/native/src/freedink
+++ freedink2/native/src/freedink
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  
│  Displaying notes found in: .note.gnu.build-id
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: a689574d69072bb64b28ffb82547e126284713fa
│ +    Build ID: d7be191a61e84648a58c18e9c108b3f3ce500302

What on earth is Build ID and how it is computed?
After much digging, I find it's a 2008 plan with application in selecting matching detached debugging symbols.
https://fedoraproject.org/wiki/RolandMcGrath/BuildID is the most detailed overview/rationale I found.
It is supposed to be computed from parts of the binary. It's actually pretty resistant to changes, e.g. I could add the missing "return 0;" in my hello source and get the exact same Build ID!
On the other hand my FreeDink binaries do match except for the Build ID so there must be a catch.

Let's try our basic example with default ./configure CFLAGS:

$ (cd repro/ && gcc -g -O2 hello.c -o hello)
$ (cd repro/ && gcc -g -O2 hello.c -o hello-b)
$ md5sum repro/hello repro/hello-b
6b2cd79947d7c5ed2e505ddfce167116  repro/hello
6b2cd79947d7c5ed2e505ddfce167116  repro/hello-b
# => OK for now

$ (cd repro2/ && gcc -g -O2 hello.c -o hello)
$ md5sum repro2/hello
20b4d09d94de5840400be05bc76e4172  repro2/hello
$ strip repro/hello repro2/hello
$ diffoscope repro/hello repro2/hello
--- repro/hello
+++ repro2/hello2
├── readelf --wide --notes {}
│ @@ -3,8 +3,8 @@
│    Owner                 Data size  Description
│    GNU                  0x00000010  NT_GNU_ABI_TAG (ABI version tag)
│      OS: Linux, ABI: 2.6.32
│  
│  Displaying notes found in: .note.gnu.build-id
│    Owner                 Data size  Description
│    GNU                  0x00000014  NT_GNU_BUILD_ID (unique build ID bitstring)-    Build ID: 462a3c613537bb57f20bd3ccbe6b7f6d2bdc72ba
│ +    Build ID: b4b448cf93e7b541ad995075d2b688ef296bd88b
# => issue reproduced with -g -O2 and different build directories

$ (cd repro/ && gcc -O2 hello.c -o hello)
$ (cd repro2/ && gcc -O2 hello.c -o hello)
$ md5sum repro/hello repro2/hello
1571d45eb5807f7a074210be17caa87b  repro/hello
1571d45eb5807f7a074210be17caa87b  repro2/hello
# => culprit is not -O2, so culprit is -g

Bummer. So the build ID must be computed also from the debug symbols, even if I strip them afterwards :(
OK, so when https://reproducible-builds.org/docs/build-path/ says "Some tools will record the path of the source files in their output", that means the compiler, and more importantly the stripped executable.

Conclusion: apparently to achieve reproducible builds I need identical full build paths and to keep track of them.

What about Windows/MinGW btw?

$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
e0fa685f6866029b8e03f9f2837dc263  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
df7566c0ac93ea4a0b53f4af83d7fbc9  hello.exe
$ /opt/mxe/usr/bin/i686-w64-mingw32.static-gcc hello.c -o hello.exe
$ md5sum hello.exe 
bbf4ab22cbe2df1ddc21d6203e506eb5  hello.exe

PE compiler output is not stable through time.
(any clue?)

OK, there's still a long road ahead of us...


There are lots of other questions.
Is autoconf output reproducible?
Does it actually matter if autoconf is reproducible if upstream is providing a pre-generated ./configure?
If not what about all the documentation on making tarballs reproducible, along with the strip-nondeterminism tool?
Where do we draw the line between build and build environment?
What are the legal issues of distributing a docker-based build environment without every single matching distro source packages?

That was my modest contribution to practical reproducible builds documentation for developers, I'd very much like to hear about more of it.
Who knows, maybe in the near future we'll get reproducible official builds for Eclipse, ZAP, JetBrains, Krita, Android SDK/NDK... :)

Posted Fri Mar 24 00:09:12 2017 Tags:

Based on the documentation from http://android-rebuilds.beuc.net/, I setup fully-automated recipes to rebuild the Android SDK, including setting up the reference distro environment and dependencies, with help from Docker.

For instance, to rebuild a complete Android SDK 6.0.1, checkout

https://gitlab.com/android-rebuilds/auto/tree/master/sdk-6.0.1

and type:

   make

That's all !

Posted Fri Jan 22 21:43:12 2016 Tags:

I published some Free rebuilds of the Android SDK, NDK and ADT at:

http://android-rebuilds.beuc.net/

As described in my previous post, Google is click-wrapping all developer binaries (including preview versions for which source code isn't published yet) with a non-free EULA, notably an anti-fork clause.

There's been some discussion on where to host this project at the android@lists.fsfe.org campaign list.

Build instructions are provided, so feel free to check if the builds are reproducible, and contribute instructions for more tools!

Posted Thu Oct 1 14:41:44 2015 Tags:

Going back to Android recently, I saw that all tools binaries from the Android project are now click-wrapped by a quite ugly proprietary license, among others an anti-fork clause (details below). Apparently those T&C are years old, but the click-wrapping is newer.

This applies to the SDK, the NDK, Android Studio, and all the essentials you download through the Android SDK Manager.

Since I keep my hands clean of smelly EULAs, I'm working on rebuilding the Android tools I need.
We're talking about hours-long, quad-core + 8GB-RAM + 100GB-disk-eating builds here, so I'd like to publish them as part of a project who cares.

As a proof-of-concept, the Replicant project ships a 4.2 SDK and I contributed build instructions for ADT and NDK (which I now use daily).

(Replicant is currently stuck to a 2013 code base though.)

I also have in-progress instructions on my hard-drive to rebuild various newer versions of the SDK/API levels, and for the NDK whose releases are quite hard to reproduce (no git tags, requires fixes committed after the release, updates are partial rebuilds, etc.) - not to mention that Google doesn't publish the source code until after the official release (closed development) :/ And in some cases like Android Support Repository [not Library] I didn't even find the proper source code, only an old prebuilt.

Would you be interested in contributing, and would you recommend a structure that would promote Free, rebuilt Android *DK?

The legalese

Anti-fork clause:

3.4 You agree that you will not take any actions that may cause or result in the fragmentation of Android, including but not limited to distributing, participating in the creation of, or promoting in any way a software development kit derived from the SDK.

So basically the source is Apache 2 + GPL, but the binaries are non-free. By the way this is not a GPL violation because right after:

3.5 Use, reproduction and distribution of components of the SDK licensed under an open source software license are governed solely by the terms of that open source software license and not this License Agreement.

Still, AFAIU by clicking "Accept" to get the binary you still accept the non-free "Terms and Conditions".

(Incidentally, if Google wanted SDK forks to spread and increase fragmentation, introducing an obnoxious EULA is probably the first thing I'd have recommended. What was its legal team thinking?)

Indemnification clause:

12.1 To the maximum extent permitted by law, you agree to defend, indemnify and hold harmless Google, its affiliates and their respective directors, officers, employees and agents from and against any and all claims, actions, suits or proceedings, as well as any and all losses, liabilities, damages, costs and expenses (including reasonable attorneys fees) arising out of or accruing from (a) your use of the SDK, (b) any application you develop on the SDK that infringes any copyright, trademark, trade secret, trade dress, patent or other intellectual property right of any person or defames any person or violates their rights of publicity or privacy, and (c) any non-compliance by you with this License Agreement.

Usage restriction:

3.1 Subject to the terms of this License Agreement, Google grants you a limited, worldwide, royalty-free, non-assignable and non-exclusive license to use the SDK solely to develop applications to run on the Android platform.

3.3 You may not use the SDK for any purpose not expressly permitted by this License Agreement. Except to the extent required by applicable third party licenses, you may not: (a) copy (except for backup purposes), modify, adapt, redistribute, decompile, reverse engineer, disassemble, or create derivative works of the SDK or any part of the SDK; or (b) load any part of the SDK onto a mobile handset or any other hardware device except a personal computer, combine any part of the SDK with other software, or distribute any software or device incorporating a part of the SDK.

If you know the URLs, you can still direct-download some of the binaries which don't embed the license, but all this feels fishy. GNU licensing didn't answer me (yet). Maybe debian-legal has an opinion?

In any case, the difficulty to reproduce the *DK builds is worrying enough to warrant an independent rebuild.

Did you notice this?

Posted Tue Sep 22 00:03:40 2015 Tags:

Here's a WebGL-powered invitation to the GNU Hackers Meeting 2013, which is taking place in Paris on August 22-25 :)

http://www.beuc.net/demo-ghm/v1/

Free, hardware-accelerated WebGL support is recent under GNU/Linux, you'll need Mesa/Mesa-DRI 9 and Linux kernel 3.6 from Debian sid plus a recent Firefox, though a Mesa-software fallback is usually available in earlier versions. Meanwhile, here's a WebM capture:

(Curious about the capture process?)

Work is underway on an extended version of this invitation/demo, peek here :)

http://www.beuc.net/demo-ghm/v2/

Posted Sun Jul 7 15:13:54 2013 Tags:

Great news!

The Learn OpenGL ES website recently switched its licensing to Creative Commons BY-SA 3.0 :)

http://www.learnopengles.com/announcing-the-new-qa-and-forums-and-a-roundup/

It provides tutorials for OpenGL ES using Java/Android and WebGL, and is focusing on a more community-oriented creative process. Give them cheers!

Posted Sat Feb 23 22:25:46 2013 Tags:

This blog is powered by ikiwiki.