Obsolescence Notice

The version described in this document is obsolete and should not be used for new applications anymore.

The links to the Javadoc in this document are non-functional because the package names have been changed in TrueZIP 7. The Javadoc for TrueZIP 6.8.4 is now available for download via Maven Central.

Introduction

Key Claims:

  1. Archive files are virtual directories!
  2. Make simple things easy and complex things possible!
  3. Quality matters!

TrueZIP is a Java based virtual file system (VFS) implementation for transparent read/write access to archive files as if they were directories. Archive files may be arbitrarily nested and the nesting level is only limited by heap and file system size. TrueZIP's design strategy is to "make simple things easy and complex things possible".

For example, when using the default configuration, the instance new File("app.ear/web.war/WEB-INF/lib/lib.jar/META-INF/MANIFEST.MF") would refer to the text file entry META-INF/MANIFEST.MF within the JAR file entry WEB-INF/lib/lib.jar within the WAR file entry web.war within the EAR file app.ear. The same applies for any other supported and configured archive type.

Limitation: TrueZIP works on files only. If an archive is just available as an InputStream, it needs to be saved to a (temporary) file first. If this is not an option, TrueZIP can't be used.

TrueZIP provides a pluggable archive driver architecture to support virtually any archive type: Out of the box, TrueZIP supports ZIP, SFX/EXE, JAR, TZP, TAR, TAR.BZ2 and TAR.GZ. This list will be extended in future releases. Note that some of these archive types require additional JARs on the run time class path.

To be minimal intrusive, TrueZIP provides drop-in replacements for many classes in the java.io package. This reduces the learning curve to the essential minimum and allows to introduce TrueZIP's functionality gradually within legacy applications or even other VFS solutions which are currently built on top of java.io.File*.

TrueZIP 6 requires only a J2SE 1.4.2 compatible Java Runtime Environment, but version 6.4 and later automatically benefit from new features available in JSE 5 and even JSE 6.

TrueZIP 6 is covered under the Apache License, Version 2.0.

Basic Tasks

Setting up the Class Path

With its default configuration file, TrueZIP 6 is self contained, so all you need to add to the class path is the file truezip.jar. Note that the file name just includes the major version number in order to express that a later version should be binary compatible to any previous version with the same major version number.

If the configuration gets customized, additional JARs may be required on the run time class path. Please see below how to customize the configuration and refer to the API Overview for the available options and dependencies.

Important: If TrueZIP is going to be used in a multi-class-loader environment (e.g. application servers), it must be added to the boot class path or extension class path. Otherwise, the different class definitions may shadow and bypass each other's associated archive file state, which may even cause loss of data! Please refer to the section "Third Party Access" in the Javadoc for the package de.schlichtherle.io for for full details and workarounds. Simply consider TrueZIP being an extension to the JRE, not part of a web application.

Importing TrueZIP

In order to use TrueZIP, you need to import its packages. Unless otherwise noted, only the high level API in the package de.schlichtherle.io is required for the examples in this tutorial. Because it provides drop-in replacements for some equally named classes in the package java.io, the following boilerplate is required:

import de.schlichtherle.io.*;
import de.schlichtherle.io.File;
import de.schlichtherle.io.FileInputStream;
import de.schlichtherle.io.FileOutputStream;
import java.io.*;
// ...
File file = new File("archive.zip"); // de.schlichtherle.io.File!

Please do not do this instead:

import java.io.*;
// ...
de.schlichtherle.io.File file = new de.schlichtherle.io.File("archive.zip");

This is for the following reasons:

  1. Accidentally using java.io.File and de.schlichtherle.io.File instances referring to the same path concurrently will result in erroneous behaviour and may even cause loss of data! Please refer to the section "Third Party Access" in the Javadoc for the package de.schlichtherle.io for full details and workarounds.
  2. A de.schlichtherle.io.File subclasses java.io.File and thanks to polymorphism can be used everywhere a java.io.File could be used.
  3. Shorter source code is much easier to read.
  4. I'm not vain enough to want you write my name all over the place. If you're becoming a fan, please make a donation instead. ;-)

If you still need to use java.io.File, please use this fully qualified class name.

Casting Return Values

Some methods in the File class return new File instances. These methods are guaranteed to return a de.schlichtherle.io.File. However, since TrueZIP 6 is compiled with -source 1.4, the overridden methods are still declared to return a java.io.File and hence a cast may be required:

File file = new File("archive.zip/entry");
File parent = (File) file.getParentFile(); // cast required!

This is going to change in TrueZIP 7, which will require Java SE 5.

Basic Operations

You may already guess how to create a new ZIP file by now:

new File("archive.zip").mkdir();

This works for any archive file suffixes which are configured to be recognized. By default, this is "ear|jar|war|zip".

Note that the call to mkdir() is actually redundant: TrueZIP creates archive files and any missing directories inside on the fly unless setLenient(false) has been called to switch off lenient behavior. Note again that this feature only works for archive files and directories inside archive files: The parent directory of a top level archive file must always exist.

Since TrueZIP 6.5, if the call to mkdir() is omitted, TrueZIP will not create an entry for the directory in the output archive file. This is to mimic the behavior of most archive utilities which do not create archive entries for directories.

For example, to start writing a new file entry in a JAR archive you may simply use:

OutputStream out = new FileOutputStream("my-killer-app.jar/META-INF/LICENSE.TXT");
try {
    // Do I/O here...
} finally {
    out.close(); // ALWAYS close the stream!
}

However, it's highly recommended to use one of the more advanced cat* and copy* methods in the File class instead whenever applicable. These methods provide ease of use, enhanced features, superior performance and require less space in the temp file folder (see below).

Note how the code ensures to close the stream even if an IOException is thrown by calling close() in a finally-block: If the client application does not properly close its streams, TrueZIP may throw a FileBusyException, an ArchiveBusyWarningException or an ArchiveBusyException on certain operations, as documented in the Javadoc. This idiom is not at all specific to TrueZIP: Streams often utilize OS resources such as file descriptors, database or network connections. All OS resources are limited however and sometimes they are even exclusively allocated for a stream, so the stream should always be closed as soon as possible again, especially in long running server applications (relying on finalize() to do this during garbage collection is unsafe). Unfortunately, many Java applications and libraries fail in this respect.

The top level entries in an archive file build its virtual root directory. Just like a regular directory, you can list its contents like follows:

String[] members = new File("my-killer-app.jar").list();

Following the previous example, just new String[] { "META-INF" } would be returned. So just like a regular directory, only the contents of the virtual root directory are listed, not including the contents of its directory members (this method is not recursive).

Note that META-INF is returned although it hasn't been created with mkdir() and hence will not be output to the resulting JAR file. Such a directory is called a ghost directory: A ghost directory behaves like a regular directory with the exception that its last modification time returned by lastModified() is 0L. If the client application sets the last modification time explicitly using setLastModified(long), then the ghost directory reincarnates as a regular directory and will be output to the archive file.

Essential Concepts

Unfortunately, before we can continue with the task oriented approach in this tutorial, you'll need to learn about some essential concepts of TrueZIP. This is to keep the examples more concise by skipping the configuration boilerplate and to set you on the right track so that you can safely avoid some common pitfalls (yes, there are some). Please accept my apologies for this - I try to keep this section as short as possible.

Since version 6.0, TrueZIP supports a pluggable archive driver architecture which allows it to support virtually any archive type. This enables third parties to develop their own drivers and plug them into the TrueZIP API. The client application can extend or override the configuration provided by TrueZIP and any optional plug-in drivers.

The ArchiveDetector Interface

Whenever a File instance is constructed, an instance of the ArchiveDetector interface is assigned to it which detects archive files solely by scanning the file path - usually by testing for file name suffixes like .zip or the like. Whenever an archive file is recognized, the ArchiveDetector.getArchiveDriver(String) method returns an instance of the ArchiveDriver interface which allows to access it.

ArchiveDetector instances are assigned to File instances in the following way:

  1. If an archive detector is explicitly provided as a parameter to the constructor of the File class or any other method which creates File instances (e.g. listFiles(*)), then this archive detector is used.
  2. Otherwise, the archive detector returned by File.getDefaultArchiveDetector() is used. This is initially set to the predefined instance ArchiveDetector.DEFAULT . Both the class property and the predefined instance can be customized (see below).

For your convenience, the ArchiveDetector interface provides three predefined instances as constant fields. These are actually instances of the DefaultArchiveDetector class (see next section).

  • ArchiveDetector.NULL never recognizes archive files in a path. This can be used as the end of a chain of DefaultArchiveDetector instances or if archive files shall be treated like ordinary files rather than (virtual) directories.
  • ArchiveDetector.DEFAULT recognizes the archive file suffixes defined by the key DEFAULT in the configuration file(s). If only TrueZIP's default configuration file is used, then this is set so that no additional JARs are required on the run time class path.
  • ArchiveDetector.ALL recognizes all archive file suffixes registered in the global registry by the configuration file(s). This requires additional JARs on the run time class path.

The DefaultArchiveDetector Class

The class DefaultArchiveDetector is the default implementation of the ArchiveDetector interface. Each instance matches file paths against a pattern of archive file suffixes in order to detect prospective archive files and look up their corresponding archive driver in its registry.

When this class is loaded, it uses the current thread's context class loader to enumerate all instances of the relative path META-INF/services/de.schlichtherle.io.registry.properties on the class path. These configuration files are processed in arbitrary order to configure the global registry of archive file suffixes and archive drivers. This allows archive drivers to be "plugged in" by simply providing their own configuration file somewhere on the class path. One such instance is located inside truezip.jar and contains TrueZIP's default configuration (please refer to this file for full details on the syntax). Likewise, the client application may provide its own configuration file somewhere on the class path in order to extend or override the settings configured by TrueZIP and any optional plug-in drivers.

Each instance has a local registry. Constructors are provided which allow an instance to:

  1. Filter the set of archive file suffixes in the global registry. For example, "tar|zip" could be accepted by the filter in order to recognize only the TAR and ZIP file formats.
  2. Add custom archive file suffixes for supported archive types to the local registry in order to create custom archive types. For example, "myapp" could be added as a custom archive file suffix for the JAR file format.
  3. Add custom archive file suffixes and archive drivers to the local registry in order to support new archive types. For example, the suffix "7z" could be associated to a custom archive driver which supports the 7z file format.
  4. Put multiple instances in a chain of responsibility: The first instance which holds a mapping for any given archive file suffix in its registry determines the archive driver to be used.

Altogether, this enables to build arbitrary complex configurations with very few lines of Java code or properties in the configuration file(s).

Updating Archive Files

To provide random read/write access to archive files, TrueZIP needs to associate some state for every recognized archive file on the heap and in the folder for temporary files while the client application is operating on the VFS.

TrueZIP performs the mounting and unmounting of the VFS for archive files implicitly and you can safely rely on this. However, sometimes explicit unmounting may be required if...

  1. third parties require access to the same path, or...
  2. some control is required over the exceptions which may be thrown when unmounting, or...
  3. the progress of updating large archive files shall get monitored.

In this context, third parties are:

  1. Instances of the class java.io.File which are not instances of the class de.schlichtherle.io.File.
  2. Instances of the class de.schlichtherle.io.File which do not recognize the same set of archive files in the path due to the use of a differently working de.schlichtherle.io.ArchiveDetector.
  3. Other definitions of the classes in this package which have been loaded by different class loaders.
  4. Other system processes.

Explicit unmounting can simply be performed by calling File.umount() like this:

try {
    File.umount(); // with or without parameters
} catch (ArchiveWarningException oops) {
    // Only instances of the class ArchiveWarningException exist in
    // the sequential chain of exceptions. We decide to ignore this.
} catch (ArchiveException ouch) {
    // At least one exception occured which is not just an
    // ArchiveWarningException. This is a severe situation which
    // needs to be handled.
    // Print the sequential chain of exceptions in order of
    // descending priority and ascending appearance.
    //ouch.printStackTrace();
    // Print the sequential chain of exceptions in order of
    // appearance instead.
    ouch.sortAppearance().printStackTrace();
}

However, overly calling umount() can cause serious performance degradation: Unmounting a modified archive file is a linear runtime operation. If the size of the resulting archive file is s bytes, the operation always completes in O(s), even if only a single, small archive entry has been modified within a very large archive file. Unmounting an unmodified or newly created archive file is a constant runtime operation: It always completes in O(1). These magnitudes are independent of whether unmounting was performed explicitly or implicitly. Now if the client application modifies each entry in a loop and accidentally triggers unmounting the archive file on each iteration, then the overall runtime increases to O(s*s)!

Here are some guidelines to find the right balance between performance and control:

  1. When the JVM terminates, TrueZIP's JVM shutdown takes care of unmounting and prints the stacktrace of any exceptions on the standard error output. However, calling umount() is recommended in order to handle exceptions explicitly. Furthermore, shutdown hooks shall not take long, but updating a large archive file may take a considerable amount of time doing I/O.
  2. Otherwise, in order to achieve best performance, umount() or update() should not get called unless either third party access or explicit exception handling is required.
  3. For the same reason, these methods should never get called in a loop which modifies an archive file.
  4. umount() is generally preferred over update() for safety reasons.

To get the full story, please refer to the package Javadoc for de.schlichtherle.io.

Recognizing False Positives

If an archive detector hits a file path which probably denotes an archive file, the file is said to be a prospective archive file. On the first read or write access to a prospective archive file, TrueZIP checks its true state in cooperation with the registered archive driver for the respective file suffix. If the true state of the file turns out to be actually a directory or to be incompatible with the archive file format, it's said to be a false positive archive file.

Just like a native file system, the behavior of all read and write operations in TrueZIP depends on the true state of a file. This is an important concept which ensures that TrueZIP cannot get fooled by a false positive archive file. However, a false positive archive file sometimes fools the user. Consider the following example: The ZIP file archive.zip has been created by 7-Zip and uses 7-Zip's popular, but proprietary LZMA compression. Now a user would probably expect new File("archive.zip").list() to return the contents of the virtual root directory in the archive file. However, this method simply returns null because archive.zip is not compatible to the ZIP file format specification and hence is treated like a regular file. For regular files however, the contract of File.list() requires to return null.

Customizing Archive File Suffixes and Archive Drivers

Out of the box, the global registry is configured to support the ZIP and TAR file family with all known relatives. For a complete reference of all supported archive file formats and their parameters please refer to the API Overview.

However, in order to make TrueZIP's JAR self-contained and avoid unwanted side effects, the default configuration causes the File class to recognize the archive file suffixes "ear|jar|war|zip" only (whereby case is ignored). Even if this is a super set of the archive file suffixes which need to get recognized in paths, the client application should always customize this set:

  • Treating an archive file like a virtual directory although it's not required causes unnecessary memory and runtime overhead and may even confuse the client application or the user.
  • The set of archive file suffixes recognized by default may be extended without prior notice in future releases of TrueZIP, which adds to the previous point.

Typical Configuration Tasks

Task #1: Selecting supported Archive File Suffixes

With this task, a subset of all archive file suffixes registered in the global registry by TrueZIP's default configuration file shall get recognized.

Suppose that the client application needs to support the suffix set "tar.bz2|tar.gz|zip". Note that for the TAR and BZIP2 support to work, ant.jar from Apache's Ant, version 1.8.1 or higher must be added to the run time class path.

Since archive drivers for both archive file suffixes are already registered by TrueZIP's default configuration file, the client application just needs to make these suffixes recognized whenever a File instance is created.

Task #2: Defining a custom Document File Format

With this task, a custom application (container) file format shall get supported by registering and recognizing a custom archive file suffix for a registered archive driver.

Suppose that the client application needs to implement a custom document file format. Because a document can be composed of many complex elements, it's a good idea to choose a well known archive file format and store the elements as its entries. This makes the custom document file format easily extensible without breaking backwards compatibility in future releases. If a custom document file suffix is associated to it, the use of the archive file format is completely opaque to the users: They just see flat files with a custom suffix. This is exactly the idea of the Java Archive format (JAR) or the OpenDocument Format (ODF).

For best results, it's recommended to adopt the JAR file format for this purpose. The JAR file format is well documented and provides some benefits over other archive file formats:

  • It uses UTF-8 for file name encoding and comments - unlike ZIP, which only uses IBM437.
  • It provides a central directory for fast and storage efficient random read access to arbitrary archive entries - unlike TAR.

The client application also has the option of transparently encrypting the contents of the document file by adopting TrueZIP's custom TZP file format. A TZP file is actually a regular JAR file which is entirely encrypted and wrapped in TrueZIP's custom Random Access Encryption Specification (RAES) file format. RAES uses the Advanced Encryption Standard (AES) in Counter (CTR) mode in order to support transparent, fast random read access. For more information about RAES, please refer to the Javadoc of the package de.schlichtherle.crypto.io.raes .

Just for the fun of it, let's assume you want to adopt the JAR and TZP file formats with "foo" and "bar" as their respective custom suffixes. Document files with a .foo suffix should then be unencrypted JAR files, while document files with a .bar suffix should be JAR files which have been encrypted and wrapped in the RAES file format. This yields the additional benefit that any .foo file can be converted to an encrypted .bar file by a simple copy operation and vice versa - if provided with a valid key.

Task #3: Supporting a new Archive File Format

With this task, a new archive type shall get supported by registering and recognizing a custom archive file suffix and archive driver.

This is actually the same as the previous task. The only difference is that a custom archive driver needs to be registered instead of reusing an archive driver which is already supported out of the box. Because of this similarity, this task is not further explained in the following section.

Approaches to accomplish these Tasks

This section explains the different approaches to accomplish these two tasks in order of descending preference and ascending priority.

Approach #1: Supplying a Configuration File on the Class Path

This approach uses the configuration files on the class path to configure TrueZIP. It affects all instances of the File class unless an ArchiveDetector is explicitly provided.

Task #1: Selecting supported Archive File Suffixes

With this approach, a file with the relative path META-INF/services/de.schlichtherle.io.registry.properties needs to be put somewhere on the class path which contains the following single line:

DEFAULT=tar.bz2|tar.gz|zip

Note that DEFAULT is a keyword and must be written in uppercase letters. The case of tar.bz2|tar.gz|zip doesn't matter, but the canonical form is all lowercase letters.

The value of the DEFAULT entry depends on TrueZIP's configuration files on the class path which must configure the global registry to hold mappings for all archive file suffixes specified in the list to their respective archive drivers. The default configuration file truezip.jar/META-INF/services/de.schlichtherle.io.registry.properties holds mappings for the archive file suffixes tar.bz2, tar.gz and zip and many more, so this is not an issue.

The effect is that the predefined ArchiveDetector.DEFAULT now recognizes exactly only the specified archive file suffixes. This archive detector is used for new File instances unless the class property defaultArchiveDetector has been changed or an archive detector is explicitly provided to the constructor.

Task #2: Defining a custom Document File Format

This time, the configuration file needs to look like this:

DEFAULT=foo|bar
foo=de.schlichtherle.io.archive.zip.JarDriver
bar=de.schlichtherle.io.archive.zip.raes.SafeZipRaesDriver

The second and third lines register mappings from an archive file suffix (e.g. foo), to its respective archive driver (e.g. de.schlichtherle.io.archive.zip.JarDriver). Needless to say that the name of the archive driver class is case sensitive. Again, the archive file suffix is case insensitive, but the canonical form is all lowercase letters.

Note that unlike the previous task, this task's configuration file is self contained: It does not rely on any other configuration file to be present on the class path.

Approach #2: Using the Command Line

This approach is even simpler than the previous one. However, it's only applicable for the first task and requires TrueZIP 6.5.2 or higher. Like the first approach, it affects all instances of the File class unless an ArchiveDetector is explicitly provided. If both approaches are used, this approach takes priority. This is intended to be used by users who want to override the configuration on a case by case basis. Client applications should not depend on this.

Task #1: Selecting supported Archive File Suffixes

With this approach, the client application simply configures the list of archive file suffixes recognized by default on the command line like this:

$ java -Dde.schlichtherle.io.default=tar.bz2|tar.gz|zip -cp truezip.jar ...

Again, de.schlichtherle.io.default is a keyword and must be written in lowercase letters. The case of tar.bz2|tar.gz|zip doesn't matter, but the canonical form is all lowercase letters.

The dependencies on the global registry and the effect is the same as in the previous approach.

Approach #3: Setting the defaultArchiveDector Property in the File class

This approach is equivalent to the first approach if and only if the property is set right at the start of the client application, before the first File instance is created.

Task #1: Selecting supported Archive File Suffixes

With this approach, the client application needs to call the following statement right at application startup:

File.setDefaultArchiveDetector(new DefaultArchiveDetector("tar.bz2|tar.gz|zip"));

The effect is that this archive detector is used for new File instances unless an archive detector is explicitly provided to the constructor. This overrides any DEFAULT entry in configuration files, but still uses their registered mappings for archive file suffixes and archive drivers.

Just like the previous approaches, this depends on TrueZIP's configuration files on the class path which must configure the global registry to hold mappings for all archive file suffixes specified in the list to their respective archive drivers.

Task #2: Defining a custom Document File Format

For this task, the client application needs to call the following statements:

File.setDefaultArchiveDetector(new DefaultArchiveDetector(
      ArchiveDetector.NULL, // delegate
      new String[] {
          "foo", "de.schlichtherle.io.archive.zip.JarDriver",
           "bar", "de.schlichtherle.io.archive.zip.raes.SafeZipRaesDriver",
      }));

Note the use of ArchiveDetector.NULL as a delegate which's registry is to be used in case there is no mapping in the local registry for a given suffix. Since the NULL instance does not contain any mappings, .foo and .bar will be the only recognized suffixes. Alternatively, the predefined instances ArchiveDetector.DEFAULT or ArchiveDetector.ALL could be used.

Instead of passing fully qualified class name strings, archive driver instances could also be passed. If class names are passed, they are loaded using the current thread's context class loader - so please make sure that the class loader hierarchy is set up correctly. In any case, the parameter must refer to an implementation/instance of the ArchiveDriver interface in the package de.schlichtherle.io.archive.spi.

Just like the first approach to this task, this configuration is self contained.

Approach #4: Explicitly providing an ArchiveDetector

With this approach, the client application explicitly provides an ArchiveDetector to the constructor or method of the File class instead of relying on the default settings.

Warning: If incorrectly used, this approach may show unwanted side effects: TrueZIP caches state information of updated archive files on the heap and in temporary files until the archive file gets unmounted. If the same file path is accessed using different archive detectors, this state information may get bypassed. Depending on how the client application operates on files, this may even cause loss of data!

So this approach shouldn't be used unless there's good reason to. One such reason is to create verbatim copies of archive files which would otherwise be treated like directories and hence be copied recursively (see below).

Task #1: Selecting supported Archive File Suffixes

Instead of relying on defaults, the client application may also provide an archive detector explicitly when creating a File instance:

ArchiveDetector detector = new DefaultArchiveDetector("tar.bz2|tar.gz|zip");
File file = new File("archive.zip", detector);

Note that unlike the previous examples and for completeness only, this example shows the instantiation of the File class.

Task #2: Defining a custom Document File Format

For completeness, here's the same approach using an explicitly provided archive detector to accomplish the second task:

ArchiveDetector detector = new DefaultArchiveDetector(
       ArchiveDetector.NULL,
       new Object[] {
          "foo", new de.schlichtherle.io.archive.zip.JarDriver(),
          "bar", new de.schlichtherle.io.archive.zip.raes.SafeZipRaesDriver(),
      }));
File file = new File("archive.foo", detector);

This time, the constructor is passed driver instances instead of class names.

Copying and Concatening

Archive files are typically created (packed) and unpacked from and to directory trees in the real file system. According to the TrueZIP logic, this is essentially a recursive copy operation between the virtual root directory of the archive file and the real directory in the native file system.

TrueZIP's File class provides a family of methods which make this task as simple as possible. At the same time, these methods also provide improved performance (equivalent to java.nio.Channel.transfer()) and some enhanced features. For example, under the hood all copy methods work asynchronously: A background thread reads the data from the source, filling up a ring of large buffers, while the foreground thread writes the data to the destination, flushing the ring. If the source and destination are actually entries in a ZIP file, the data is copied directly without decompressing and recompressing it again - this is called Direct Data Copying (DDC). These are just two of the many built-in optimizations which provide improved performance when copying archive files. For more information, please refer to the Javadoc for the File class.

In order to keep the examples of this section as short as possible, it's assumed that you have set up the configuration as explained in the previous section.

Important: Note that the File class does not do path name expansion. So for any file system operation, the client application has to provide complete (relative or absolute) path names.

Here's how to pack a directory tree into a ZIP file (for unpacking, simply swap the source and destination):

new File("directory").copyAllTo(new File("archive.zip"));

This method recursively copies the contents of the "directory" to the ZIP file "archive.zip".

Here's how to convert the ZIP file created in the last example to a TAR.BZ2 file:

new File("archive.zip").archiveCopyAllTo(new File("archive.tar.bz2"));

This time we use the method File.archiveCopyAllTo(File) to copy the last modification time of all entries in the source to the destination, too. Note that as soon as JSR 203 is in place, this method should copy all accessible meta data - not just the last modification time. For now, unfortunately this is all it can do.

Consider the task of defining a custom document file format again: Document files with a .foo suffix should get unencrypted while document files with a .bar suffix should get encrypted.

Here's how to convert an unencrypted document file to an encrypted document file:

new File("document.foo").archiveCopyAllTo(new File("document.bar"));

When this method is called, the user will be prompted with a GUI dialog to enter a new password for "document.bar", unless this file already exists, in which case the user would be prompted to enter the password for the existing file. If no GUI is available and the JRE conforms to JSE 6, the user is prompted on the console instead. If no console is available, the operation fails. Please see below how to customize passwords.

Since archive files may be arbitrarily nested, there's also a way to specify whether archive files inside the top level source and destination folders shall be recognized as virtual directories or plain files.

Here's how to inhibit recursing into a nested archive file in order to unpack all archive files in the source to the destination, including nested archive files:

new File("archive.zip").archiveCopyAllTo(new File("directory"),
ArchiveDetector.DEFAULT, ArchiveDetector.NULL);

The archive detector parameters are respectively used when recursing into source and destination folders: ArchiveDetector.DEFAULT is passed to the constructor of all File instances for the source folder and ArchiveDetector.NULL is passed to the constructor of all File instances for the destination folder. So the effect of this call is that all enclosed archive files in the source tree are copied to plain directories with the same name in the destination tree.

Warning: Using different archive detectors for the same file system object in different operations may bypass the state TrueZIP's associates with each archive file, which may possibly result in loss of data! In this particular example, if ArchiveDetector.DEFAULT would not recognize an archive file which has been recognized before by a another archive detector (e.g. ArchiveDetector.ALL), the state associated with the archive file would be bypassed and the copy in the destination may get corrupted. As you can guess, this is a pretty unlikely event, but it may happen. Again, the best protection is never to use explicitly provided archive detectors.

Important: Note that the previously shown methods all return the boolean value false rather than throwing an IOException to indicate failure! When I designed these methods, I thought that coherence with the legacy methods in java.io.File (which just return boolean values, too) would be more important than a decent exception handling mechanism. I was certainly wrong and I apologize for this fault.

If this bothers you, there's another option: The File class provides additional static methods named cp and cp_p which throw an IOException to indicate failure. Unfortunately, none of these methods work recursively (yet).

Here's how to copy a single file using these methods, preserving the last modification time:

File.cp_p(new File("file1"), new File("file2"));

The method name is modelled after the Unix command line utility cp with the -p option. Unlike the Unix command line utility however, it doesn't do path name expansion. Note again that this method throws an IOException on failure.

Finally, we get to the low level transport: If all you need is pumping data from an arbitrary input stream to an arbitrary output stream, please use the following:

File.cat(in, out);

where in and out are InputStream and OutputStream instances, respectively. This method throws an InputIOException if it can't read from the input stream and an IOException if it can't write to the output stream. Neither stream is ever closed, so you can use this method to concatenate data, too. Again, the method name is modelled after the Unix command line utility cat.

Note that this method can't use DDC, but it still benefits from asynchronous data copying in order to provide superior performance.

Using RAES encrypted ZIP files

RAES encrypted ZIP files are no more than regular ZIP files which use UTF-8 as their character set encoding (like JARs) and have been encrypted according to the RAES file format. The RAES file format enables transparent random access to its AES encrypted content as if the application were reading decrypted data from a RandomAccessFile. RAES is not at all specific to ZIP files - any kind of content can be encrypted. TrueZIP uses RAES to enable transparent access to encrypted ZIP files as if they were (virtual) directories.

To access RAES encrypted ZIP files, the client application needs to

  1. configure TrueZIP to recognize the file suffixes configured in the global registry (which are .tzp, .zip.rae and .zip.raes by default), and

  2. have the JAR for Bouncy Castle's Lightweight Crypto API for JDK 1.4 Version 1.30 or higher (lcrypto.jar) on the runtime class path. You can download it by searching for "lcrypto-jdk14-130" at http://www.bouncycastle.org/latest_releases.html.

To meet the first requirement, ArchiveDetector.ALL could simply be used. However, the client application would then not only need to have lcrypto.jar on the run time class path, but also all other dependencies required to access the archive types of all the other suffixes recognized in the global registry. So it's probably better to use a custom archive detector. The following example configures TrueZIP to recognize just the canonical file suffixes for ZIP files and RAES encrypted ZIP files (see before):

File.setDefaultArchiveDetector(new DefaultArchiveDetector("zip|tzp|zip.rae|zip.raes"));

Now the following file output stream writes to the entry README within the RAES encrypted ZIP file secret.tzp:

OutputStream out = new FileOutputStream("secret.tzp/README");

As you see from this example, once TrueZIP is configured, the client application treats RAES encrypted ZIP files like any other virtual directory.

If you test this example, you should be prompted for a password with a Swing based GUI or a console dialog. This is controlled by the key manager. To configure the default implementation, please refer to the section on key management below.

Miscellaneous:

  • RAES is not compatible to WinZip's encryption scheme. This is because of security issues with WinZip's encryption scheme. For more information, please refer to the news section on RAES.

  • All encryption/decryption is done in Bouncy Castle's lcrypto.jar. truezip.jar itself does not contain any encryption/decryption algorithms and hence should not be subject to export restrictions in the USA. This implies that US organizations could safely export products which use truezip.jar, but don't need support for RAES encrypted ZIP files, which would require lcrypto.jar. However, this information is without any warranties: I'm not a lawyer and I'm not even a US resident.

  • If you would like to use TrueZIP's RAES implementation to encrypt custom (non-ZIP file) data, please refer to the source code of the utility main classes encrypt and decrypt in the base package.

  • RAES uses AES in CTR mode with a maximum key strength of 256 bits. For PRNG and authentication, SHA-256 is used. RAES is extensible, so other cipher algorithms or digests could be added - CTR is the only invariant requirement. For more information please refer to the Javadoc for the package de.schlichtherle.crypto.io.raes.

Converting Archive Files to RAES encrypted ZIP Files and vice versa

TrueZIP's JAR contains a utility main class called nzip in the base package which you can use to easily convert any archive file to an RAES encrypted ZIP file and vice versa.

The following example converts an ordinary ZIP file to an RAES encrypted ZIP file (this example assumes a UNIX platform):

$ rm archive.tzp
$ java -cp truezip.jar:lcrypto.jar nzip cp archive.zip archive.tzp

The following call reverses the operation:

$ rm archive.zip
$ java -cp truezip.jar:lcrypto.jar nzip cp archive.tzp archive.zip

The initial removal of the target file is just required to prevent the nzip utility main class to copy the source archive into the destination archive if the latter already exists.

The nzip utility main class provides a lot of other Unix-like commands (cp, ll, rm, mv, rm, ...). For more information, please refer to its inline help which is printed if the class is run without arguments.

Private Key Management

Like any kind of symmetric cipher, the AES cipher used for the RAES file format requires a private key. In order to maintain full transparency for the client application, this archive type specific "detail" is handled by the archive driver in cooperation with the KeyManager class in the package de.schlichtherle.key.

The key manager is a singleton which maps protected resources to their respective key provider: A protected resource can be virtually anything which can be identified by a unique string (the resource ID) and the key provider can be any instance of the KeyProvider interface. This interface is generic enough to be agnostic of the actual source and type of a private key: The source could be a GUI dialog, a text console dialog, a secure database, a secure file, etc. and the key may be an arbitrary object.

Default Key Manager Implementations

Out of the box, TrueZIP provides two default implementations of the key manager:

  1. If the JVM is not running in headless mode, an instance of the class de.schlichtherle.key.passwd.swing.PromptingKeyManager is used. This key manager uses key provider implementations which prompt the user with a Swing based dialog whenever a private key is required to create or open a protected resource for the first time. The key provider implementations support passwords and key files. For a key file, the first 512 bytes of the file are used as the key. If a new protected resource is to be created and if the file is not read-only, is shorter than 512 bytes or the first 512 bytes are not a good source of entropy (e.g. contain only null bytes), it's rejected and the user is asked to provide a different file again.
  2. If the JVM is running in headless mode and is compatible to JSE 6, an instance of the class de.schlichtherle.key.passwd.console.PromptingKeyManager is used. This key manager uses key provider implementations which prompt the user on the text console whenever a private key is required to create or open a protected resource for the first time. The key provider implementations support passwords only.

If none of the prerequisites is met or the console is not connected to the JVM, key prompting is disabled and any attempt to create or open a protected resource fails with an exception.

Note that all dialogs are internationalized. Currently only English and German are implemented, but you can easily translate it to your favorite locale. For more information, please refer to the source code and/or contact the mailing list.

How the Key Manager is used for RAES encrypted ZIP Files

Whenever a member of the archive driver family for RAES encrypted ZIP files accesses an archive file, it asks the key manager for a key provider, using the canonical path (not just the absolute path) of the archive file as the resource ID. The type of key provider requested by the archive driver is AesKeyProvider, a subinterface of KeyProvider. This interface holds an additional property required for the key strength of the AES encryption.

The key manager looks up its registry of key provider instances. If a key provider instance doesn't exist yet for the given resource ID, the registry of key provider types is looked up for a class to instantiate. The new instance is then returned to the archive driver.

The archive driver then calls the key provider to retrieve the private key. If the key provider throws an exception because key prompting has been cancelled by the user or is disabled, the exception is wrapped in an instance of a subclass of FileNotFoundException and propagated to the client application wherever appropriate - some methods in the File class simply return the boolean value false instead.

Customizing Key Management

This architecture provides several leverages to customize key management. The following sections give an overview of how this can be done.

Approach #1: Custom Key Provider

This approach works best if you want to provide a different private key for each protected resource. As an example, you may want to hard code the private key for a particular RAES encrypted ZIP file so that the user is not bothered with prompting.

Warning: Hard coding a private key in Java pretty much spoils any security concept because the private key can be easily obtained by reverse engineering, even if you obfuscate the entire code!

To follow this approach, you need to implement the de.schlichtherle.key.KeyProvider interface and register it for the identifier (ID) of the particular protected resource with the key manager. More precisely, in case of RAES encrypted ZIP files, the ID of the protected resource is always the canonical path of the particular archive file (not just an absolute path) and the key provider must be an implementation of the AesKeyProvider interface in the same package (not just the KeyProvider interface). Otherwise the user will still be prompted or see a ClassCastException respectively.

Consider the following example:

public class SimpleAesKeyProvider implements AesKeyProvider {
    public Object getCreateKey() throws UnknownKeyException {
        return "secret".toCharArray(); // returns cloned char array!
    }

    public Object getOpenKey() throws UnknownKeyException {
        return "secret".toCharArray(); // returns cloned char array!
    }

    public void invalidOpenKey() {
        // This method is called whenever a key for an existing protected
        // resource is invalid. A real application should deal with this
        // appropriately, of course.
        throw new UnsupportedOperationException("cannot handle invalid keys!");
    }

    public int getKeyStrength() {
        return KEY_STRENGTH_256;
    }

    public void setKeyStrength(int keyStrength) {
    }
}

// ...

KeyManager.getInstance().setKeyProvider(
        "/home/acme/archive.tzp",
        new SimpleAesKeyProvider());

where "/home/acme/archive.tzp" would be the canonical path of the RAES encrypted ZIP file to serve as the ID of the protected resource and "secret" would be the password to use as the private key for this file. Note that the password must be cloned on each call and should be a char array, not just a byte array.

Approach #2: Custom Key Manager

This approach works best if you want to provide a different type of key provider for any protected resource. For example, you may want to use a secure database as the source for private keys instead of prompting the user for RAES encrypted ZIP files.

To follow this approach, you need to provide a custom key manager implementation by subclassing the class de.schlichtherle.key.KeyManager or de.schlichtherle.key.PromptingKeyManager. The difference between these base classes is that the latter class supports a pluggable user interface architecture. This could be useful if you would like to port the default key manager implementation from Swing to SWT for example.

Finally, you must set the system property de.schlichtherle.key.KeyManager to the fully qualified class name of the custom key manager implementation at JVM startup or early in the main method of the client application.

Here's a minimal example which reuses the SimpleAesKeyProvider class from the previous example:

public class CustomKeyManager extends KeyManager {
    public CustomKeyManager() {
        mapKeyProviderType(AesKeyProvider.class, SimpleAesKeyProvider.class);
    }
}

// ...

System.setProperty("de.schlichtherle.key.KeyManager", "com.acme.CustomKeyManager"));

where "com.acme.CustomKeyManager" would be the fully qualified name of the custom key manager implementation.

The call to mapKeyProviderType() tells the key manager to instantiate the SimpleAesKeyProvider class whenever a new AesKeyProvider is required. This feature replaces the common factory pattern.

Approach #3: Custom Archive Driver

This approach works best if you would like to completely bypass the key manager for a particular type of archive file. For example, you may want to hard code a private key for a custom document file format which is recognized by custom archive file suffixes.

Warning: Hard coding a private key in Java pretty much spoils any security concept because the private key can be easily obtained by reverse engineering, even if you obfuscate the entire code!

To use this approach, you need to implement a custom archive driver. The simplest option to do this is to subclass the class SafeZipRaesDriver and override the method getRaesParameters(Archive) . This method is expected to return an instance of the RaesParameters interface, which is a just a marker interface. RAES can support several authentication methods. Type 0 files use password based authentication. To read and write RAES files of this type, this method should return an instance of the interface Type0RaesParameters, which is very similar to the AesKeyProvider interface you've seen before. Next, an instance of the custom archive driver must be registered in a DefaultArchiveDetector. Finally, you need to use the archive detector explicitly or configure TrueZIP to use it implicitly.

Here's an example:

public class SimpleRaesParameters implements Type0RaesParameters {
    public char[] getCreatePasswd() throws RaesKeyException {
        return "secret".toCharArray(); // returns cloned char array!
    }

    public char[] getOpenPasswd() throws RaesKeyException {
        return "secret".toCharArray(); // returns cloned char array!
    }

    public void invalidOpenPasswd() {
        throw new UnsupportedOperationException("cannot handle invalid passwords!");
    }

    public int getKeyStrength() {
        return KEY_STRENGTH_256;
    }

    public void setKeyStrength(int keyStrength) {
    }
}

public class CustomArchiveDriver extends SafeZipRaesDriver {
    public RaesParameters getRaesParameters(Archive archive) {
        return new SimpleRaesParameters();
    }
}

// ...

File.setDefaultArchiveDetector(new DefaultArchiveDetector(
        ArchiveDetector.DEFAULT, // delegate
        "app", new CustomArchiveDriver()));

Note the similarity between SimpleRaesParameters and SimpleAesKeyProvider used in the previous examples. Similar to before, the getCreatePasswd() and getOpenPasswd() methods must return a clone of a char array. The CustomArchiveDriver instantiates this class on every call to the method getRaesParameters(Archive). Alternatively, a singleton could get returned. Finally, an instance of the custom archive driver is registered in a DefaultArchiveDetector and set as the default archive detector to use whenever a File instance is created and no archive detector is explicitly provided.

Miscellaneous

There's still more to discover in the TrueZIP API, mostly Swing utility classes for dealing with (archive) files. Unfortunately, I can't go into details for now, but here are some interesting features for which you might consider it worth studying the Javadoc:

  • Since TrueZIP 6.5, OpenDocument Format (ODF) files can be read and written, too. For example, you can use OpenOffice Writer to create a file named "document.odt" and list it with new File("document.odt", ArchiveDetector.ALL).list(). Read the Javadoc for the class de.schlichtherle.io.archive.zip.OdfDriver for more details.
  • Did you ever want file name auto completion like in some Unix shells (e.g. bash) for your Swing based client application? Wouldn't it be cool if it could browse archive files, too? Then have a look at the class de.schlichtherle.io.swing.FileComboBoxBrowser.
  • The class de.schlichtherle.io.swing.JFileTree might come in handy if you want to display and browse a directory tree with Swing, including the contents of archive files. Convenient methods for common tasks (copying, moving, deleting, etc.) are provided, too.
  • If you ever need to browse and pick and entry within archive files with a GUI, consider using the de.schlichtherle.io.swing.JFileChooser class.
  • The package de.schlichtherle.io.samples not only contains the NZip utility main class, but also two other utility main classes which can be used to conveniently Encrypt/Decrypt arbitrary files into/from RAES.
  • The class de.schlichtherle.util.CanonicalStringSet is a convenient and powerful means to operate with expressions such as "ear|jar|war|zip".
  • The package de.schlichtherle.util.zip holds drop-in replacements for the well known Zip* classes in java.util.zip. The classes in this package provide some advanced features when compared to the genuine implementation. Sometimes it's also referred to as the low level API since it's used by the ZIP archive driver family to read and write ZIP files whereas the high level API is the classes in the package de.schlichtherle.io .

That's it,basically. Thank you for reading this tutorial and using TrueZIP!