class: center, middle # Libarchive ## Multi-format archive and compression for Perl 6 Curt Tilmes *Curt.Tilmes@nasa.gov* *Philadelphia Perl Mongers* 2019-05-15 --- ## [libarchive.org](https://libarchive.org) * Multi-format archive and compression library * Streaming tar, zip, ar, cpio, iso9660, etc. * Pipelined filters like compress, gzip, bzip2, uuencode * There are a number of Perl5 modules built on top of libarchive * There is even an existing Perl6 [`Archive::Libarchive`](https://github.com/frithnanth/perl6-Archive-Libarchive) module, built on top of an existing [`Archive::Libarchive::Raw`](https://github.com/frithnanth/perl6-Archive-Libarchive-Raw) (Thank you Fernando Santagata, @frithnanth) * This work is built on top of that existing work and extends it with a higher level interface with more (and easier) functionality. --- ## Libarchive::Read ``` use Libarchive::Read; my $archive := Libarchive::Read.new('myfile.tar.gz'); for $archive -> $entry { put $entry.pathname } ``` * `Libarchive::Read` is `Iterable` providing a `Seq` of `Libarchive::Entry` objects -- ``` .pathname.put for Libarchive::Read.new('myfile.zip'); .put for Libarchive::Read.new('mydvd.iso'); .extract for Libarchive::Read.new($*IN); ``` --- ## Libarchive::Simple ``` use Libarchive::Simple; .put for archive-read 'mydvd.iso'; .extract for archive-read $*IN; ``` * `archive-read()` is just short-hand for `Libarchive::Read.new()` * Archive source can be: filename, `IO::Path` (also filename), `IO::Handle`, Memory `Buf`, `Supply` of blobs, `Channel` of blobs. --- ## Libarchive::Entry * Sort of a super-stat with all the information about a filesystem entity * `pathname`, `size`, `uid`, `gid`, `uname`, `gname`, `perm`, `mode`, `filetype` * `atime`, `mtime`, `ctime`, `birthtime` - converted to `DateTime` * `symlink` * `strmode` = human readable mode like `ls -l`: `-rw-r--r--`, `drwxr-xr-x` * `human-size` = `100B`, `25K`, `12M`, etc. * Boolean `is-file` and `is-dir` * Both `Str()` and `gist()` provide a single line summary of the file: ``` -rw-r--r-- ctilmes/ctilmes 16K 2019-03-27 16:02 README.md ``` --- ## Libarchive::Entry::Read ### Also includes: * `data` - Returns full binary data from file as a `Buf` * `content` - Decodes the content as a `utf-8` `Str` (Just `.data.decode` for other encodings.) * `extract` - Extract the file to the filesystem. -- ``` .content.put for archive-read('this.zip') .grep({.pathname ~~ /README/}); .extract(perm => 0o600, :verbose) for archive-read('foo.tar.gz'); .extract(destpath => '/putithere') for archive-read('this.zip'); ``` --- ## Extract options .left-col[* `:extract-owner` * `:extract-perm` * `:extract-time` * `:extract-no-overwrite` * `:extract-unlink` * `:extract-acl` * `:extract-fflags`] .right-col[* `:extract-xattr` * `:extract-secure-symlinks` * `:extract-secure-nodotdot` * `:extract-secure-noabsolutepaths` * `:extract-sparse` * `:extract-clear-nochange-flags` * `:verbose`] * `:destpath` - prepend to pathname (can also manually edit pathname) You can also use most of the `Libarchive::Entry` fields as `.extract()` options: ``` .extract(perm => 0o600); .extract(pathname => .pathname ~ '.new'); ``` --- ## Libarchive::Write ``` use Libarchive::Write; my $archive = Libarchive::Write.new('this.zip'); $archive.add('somefile'); $archive.close; ``` -- or just ``` with Libarchive::Write.new('foo.tar.gz') { .add: 'somefile'; .close; } ``` -- * Archive destination can be filename, `IO::Path`, `IO::Handle`, Memory `Buf`, `Supplier` of blobs, `Channel` of blobs. * If you don't specify a filename, you must specify a `format` (and optional `filter`). * `.close` is critical. --- ## Libarchive::Simple writing ``` use Libarchive::Simple; with archive-write($*OUT, format => 'zip') { .add: 'afile'; .add: 'somedir'; .add: dir('somedir'); .add: 'thisdir', dir('thisdir'); .write: 'afile', "Some content\n"; .write: 'bfile', buf8.new(1,2,3,4); .write: 'bigrandomfile', '/dev/urandom'.IO.open(:bin), size => 100_000; .mkdir: 'adir'; .mkdir: 'bdir', perm => 0o700; .symlink: 'linked', 'adir/anotherfile'; .symlink: 'anotherlink' => 'adir/yetanother'; .close; } ``` --- ## Process archives in a pipeline ``` use Libarchive::Simple; with archive-write($*OUT, format => 'zip') { .copy: archive-read($*IN, format => 'tar'); .close; } ``` -- ``` use Libarchive::Simple; with archive-write($*OUT, format => 'zip') { .write: 'NEWREADME', "This is my README\n"; .write: 'LICENSE', "Special license file\n"; .copy: archive-read($*IN, format => 'tar') .grep({ .pathname ~~ /good/}) .map({ .pathname(.pathname.uc) }) .map({ .uname('fred').perm(0o600)}); .close; } ``` --- ## Libarchive::Archive * There are some limitations to streaming archives. You must access the data at the right time. * You can use `Libarchive::Archive` to slurp an archive, including file contents into memory: ``` use Libarchive::Archive; my $archive = Libarchive::Archive.new('this.tar.gz'); put $archive
; ``` -- ``` use Libarchive::Simple; my $archive = archive-slurp 'this.tar'; say $archive; # Listing ala tar -t put $archive
; # Actual content $archive
.content = "Change content\n"; $archive
:delete; $archive.spurt: 'foo.zip'; ``` --- ## Libarchive::Archive * You can also just create the memory archive from scratch ``` use Libarchive::Simple; with archive-new() { .mkdir: 'adir'; .write: 'adir/afile', "Some content\n"; .add: 'fileinfilesystem'; .spurt: 'this.zip'; } ``` * Not as efficient as streaming, but will work fine for small things that fit into memory. --- ## format `raw` * `libarchive` supports a special format called `raw` that just passes data through as a single fake file. This allows you to use the filters in isolation. ``` with archive-write($dest, format => 'raw', filter => 'gzip') { .write('ignore-filename', $source, size => ...); .close } ``` or ``` with archive-read($source, format => 'raw') { my $header = .read; # Read and ignore the archive header while my $buf = .read-data(
) { ...do something with $buf... } } ``` * But don't do it like that! --- ## Libarchive::Filter * Those are packaged up into `Libarchive::Filter`. Files: ``` use Libarchive::Filter; archive-encode('Some content', 'file.gz', filter => 'gzip'); my $content = archive-decode('file.gz'); ``` -- Memory buffers: ``` use Libarchive::Filter; my $buf = archive-encode('Some content', filter => 'gzip'); my $content = archive-decode($buf); ``` * Can also use `IO::Path`, `IO::Handle`, Supplies, Channels, etc. --- ## Libarchive::Filter Some common filters have been packaged up into simple short-cuts: ``` use Libarchive::Filter :gzip; my $buf = gzip 'Some content'; my $content = gunzip $buf; ``` | use option | encode | decode | | :-------- | :------ | :------- | | `:gzip` | `gzip` | `gunzip` | | `:compress` | `compress` | `uncompress` | | `:bzip2` | `bzip2` | `bunzip2` | | `:lz4` | `lz4` | `unlz4` | | `:uuencode` | `uuencode` | `uudecode` | `:lzma` |`lzma` | `unlzma` | * `use Libarchive::Filter :all;` to get all the short-cuts. * Also again, streaming directly to/from your final destination will be more efficient than through a memory buffer. --- ## IO::Handle example ``` use Libarchive::Simple; with archive-write('file.tar.gz'.IO.open(:w:bin), format => 'paxr', filter => 'gzip') { .write: 'afile', "This is some content\n"; .close } .put for archive-read('file.tar.gz'.IO.open); ``` --- ## Supply example ``` use Libarchive::Simple; my $supplier = Supplier.new; my $reader = start { .put for archive-read($supplier.Supply) } with archive-write($supplier, format => 'paxr', filter => 'gzip') { .write: 'afile', "This is some content\n"; .close } await $reader; ``` --- ## Channel example ``` use Libarchive::Simple; my $channel = Channel.new; my $reader = start { .put for archive-read($channel) } with archive-write($channel, format => 'paxr', filter => 'gzip') { .write: 'afile', "This is some content\n"; .close } await $reader; ``` --- ##.center[Thank You!]