Libarchive

class: center, middle

# Libarchive

## Multi-format archive and compression for Perl 6

Curt Tilmes

*Curt.Tilmes@nasa.gov*

*Philadelphia Perl Mongers*

2019-05-15

---

## [libarchive.org](https://libarchive.org)

* Multi-format archive and compression library

* Streaming tar, zip, ar, cpio, iso9660, etc.

* Pipelined filters like compress, gzip, bzip2, uuencode

* There are a number of Perl5 modules built on top of libarchive

* There is even an existing Perl6
  [`Archive::Libarchive`](https://github.com/frithnanth/perl6-Archive-Libarchive)
  module, built on top of an existing
  [`Archive::Libarchive::Raw`](https://github.com/frithnanth/perl6-Archive-Libarchive-Raw) (Thank you Fernando Santagata, @frithnanth)

* This work is built on top of that existing work and extends it with
  a higher level interface with more (and easier) functionality.

---

## Libarchive::Read

```
use Libarchive::Read;

my $archive := Libarchive::Read.new('myfile.tar.gz');

for $archive -> $entry {
    put $entry.pathname
}
```

* `Libarchive::Read` is `Iterable` providing a `Seq` of `Libarchive::Entry`
objects

```
.pathname.put for Libarchive::Read.new('myfile.zip');

.put for Libarchive::Read.new('mydvd.iso');

.extract for Libarchive::Read.new($*IN);
```

---

## Libarchive::Simple

```
use Libarchive::Simple;

.put for archive-read 'mydvd.iso';

.extract for archive-read $*IN;

```
* `archive-read()` is just short-hand for `Libarchive::Read.new()`

* Archive source can be: filename, `IO::Path` (also filename),
`IO::Handle`, Memory `Buf`, `Supply` of blobs, `Channel` of blobs.

---

## Libarchive::Entry

* Sort of a super-stat with all the information about a filesystem entity

* `pathname`, `size`, `uid`, `gid`, `uname`, `gname`, `perm`, `mode`, `filetype`

* `atime`, `mtime`, `ctime`, `birthtime` - converted to `DateTime`

* `symlink`

* `strmode` = human readable mode like `ls -l`:

`-rw-r--r--`, `drwxr-xr-x`

* `human-size` = `100B`, `25K`, `12M`, etc.

* Boolean `is-file` and `is-dir`

* Both `Str()` and `gist()` provide a single line summary of the file:

```
-rw-r--r--   ctilmes/ctilmes    16K 2019-03-27 16:02 README.md
```
---

## Libarchive::Entry::Read

### Also includes:

* `data` - Returns full binary data from file as a `Buf`

* `content` - Decodes the content as a `utf-8` `Str` (Just
`.data.decode` for other encodings.)

* `extract` - Extract the file to the filesystem.
--

```
.content.put for archive-read('this.zip')
                 .grep({.pathname ~~ /README/});

.extract(perm => 0o600, :verbose) for archive-read('foo.tar.gz');

.extract(destpath => '/putithere') for archive-read('this.zip');
```
---

## Extract options

.left-col[* `:extract-owner`
* `:extract-perm`
* `:extract-time`
* `:extract-no-overwrite`
* `:extract-unlink`
* `:extract-acl`
* `:extract-fflags`]

.right-col[* `:extract-xattr`
* `:extract-secure-symlinks`
* `:extract-secure-nodotdot`
* `:extract-secure-noabsolutepaths`
* `:extract-sparse`
* `:extract-clear-nochange-flags`
* `:verbose`]

* `:destpath` - prepend to pathname (can also manually edit pathname)

You can also use most of the `Libarchive::Entry` fields as
`.extract()` options:

```
.extract(perm => 0o600);
.extract(pathname => .pathname ~ '.new');
```

---

## Libarchive::Write

```
use Libarchive::Write;

my $archive = Libarchive::Write.new('this.zip');
$archive.add('somefile');
$archive.close;
```

or just
```
with Libarchive::Write.new('foo.tar.gz') {
    .add: 'somefile';
    .close;
}
```

* Archive destination can be filename, `IO::Path`, `IO::Handle`,
Memory `Buf`, `Supplier` of blobs, `Channel` of blobs.

* If you don't specify a filename, you must specify a `format` (and
optional `filter`).

* `.close` is critical.

---

## Libarchive::Simple writing

```
use Libarchive::Simple;

with archive-write($*OUT, format => 'zip') {
    .add: 'afile';
    .add: 'somedir';
    .add: dir('somedir');
    .add: 'thisdir', dir('thisdir');

.write: 'afile', "Some content\n";
    .write: 'bfile', buf8.new(1,2,3,4);
    .write: 'bigrandomfile',
            '/dev/urandom'.IO.open(:bin),
            size => 100_000;

.mkdir: 'adir';
    .mkdir: 'bdir', perm => 0o700;

.symlink: 'linked', 'adir/anotherfile';
    .symlink: 'anotherlink' => 'adir/yetanother';

.close;
}
```
---

## Process archives in a pipeline

```
use Libarchive::Simple;

with archive-write($*OUT, format => 'zip') {
    .copy: archive-read($*IN, format => 'tar');
    .close;
}
```

```
use Libarchive::Simple;

with archive-write($*OUT, format => 'zip') {
    .write: 'NEWREADME', "This is my README\n";

.write: 'LICENSE', "Special license file\n";

.copy: archive-read($*IN, format => 'tar')
           .grep({ .pathname ~~ /good/})
           .map({ .pathname(.pathname.uc) })
           .map({ .uname('fred').perm(0o600)});

.close;
}
```
---

## Libarchive::Archive

* There are some limitations to streaming archives.  You must access the data
at the right time.

* You can use `Libarchive::Archive` to slurp an archive, including
file contents into memory:

```
use Libarchive::Archive;

my $archive = Libarchive::Archive.new('this.tar.gz');
put $archive<README>;
```

```
use Libarchive::Simple;

my $archive = archive-slurp 'this.tar';
say $archive;                                  # Listing ala tar -t
put $archive<README>;                          # Actual content
$archive<afile>.content = "Change content\n";
$archive<adir/bad>:delete;
$archive.spurt: 'foo.zip';
```

---
## Libarchive::Archive

* You can also just create the memory archive from scratch

```
use Libarchive::Simple;

with archive-new()
{
    .mkdir: 'adir';
    .write: 'adir/afile', "Some content\n";
    .add: 'fileinfilesystem';
    .spurt: 'this.zip';
}
```

* Not as efficient as streaming, but will work fine for small things that
fit into memory.

---
## format `raw`

* `libarchive` supports a special format called `raw` that just passes
data through as a single fake file.  This allows you to use the
filters in isolation.

```
with archive-write($dest, format => 'raw', filter => 'gzip') {
    .write('ignore-filename', $source, size => ...);
    .close
}
```

```
with archive-read($source, format => 'raw') {
    my $header = .read;  # Read and ignore the archive header
    while my $buf = .read-data(<blocksize>) {
        ...do something with $buf...
    }
}
```

* But don't do it like that!

---
## Libarchive::Filter

* Those are packaged up into `Libarchive::Filter`.

Files:
```
use Libarchive::Filter;

archive-encode('Some content', 'file.gz', filter => 'gzip');

my $content = archive-decode('file.gz');
```

Memory buffers:
```
use Libarchive::Filter;

my $buf = archive-encode('Some content', filter => 'gzip');

my $content = archive-decode($buf);
```

* Can also use `IO::Path`, `IO::Handle`, Supplies, Channels, etc.

---

## Libarchive::Filter

Some common filters have been packaged up into simple short-cuts:

```
use Libarchive::Filter :gzip;

my $buf = gzip 'Some content';
my $content = gunzip $buf;
```

* `use Libarchive::Filter :all;` to get all the short-cuts.

* Also again, streaming directly to/from your final destination will
be more efficient than through a memory buffer.

---
## IO::Handle example

```
use Libarchive::Simple;

with archive-write('file.tar.gz'.IO.open(:w:bin),
                   format => 'paxr', filter => 'gzip') {
    .write: 'afile', "This is some content\n";
    .close
}

.put for archive-read('file.tar.gz'.IO.open);
```
---
## Supply example

```
use Libarchive::Simple;

my $supplier = Supplier.new;

my $reader = start { .put for archive-read($supplier.Supply) }

with archive-write($supplier, format => 'paxr', filter => 'gzip') {
    .write: 'afile', "This is some content\n";
    .close
}

await $reader;

```
---

## Channel example

```
use Libarchive::Simple;

my $channel = Channel.new;

my $reader = start { .put for archive-read($channel) }

with archive-write($channel, format => 'paxr', filter => 'gzip') {
    .write: 'afile', "This is some content\n";
    .close
}

await $reader;
```
---

##.center[Thank You!]