{"id":33,"date":"2014-06-21T10:22:00","date_gmt":"2014-06-21T00:22:00","guid":{"rendered":""},"modified":"2018-05-22T20:53:12","modified_gmt":"2018-05-22T10:53:12","slug":"zargrep-grep-files-in-a-zip-archive","status":"publish","type":"post","link":"https:\/\/pbw.id.au\/blog\/2014\/06\/zargrep-grep-files-in-a-zip-archive\/","title":{"rendered":"zargrep: grep files in a zip archive"},"content":{"rendered":"<p>How do you search for strings within a zip archive?<\/p>\n<p>I&#8217;m tinkering with EPUB3 files, and I wanted to be able to find certain strings within .epub files, so I had a look around, and I immediately found <i><b>zgrep<\/b><\/i> and family. The trouble was that zgrep assumes a single zipped file, not an archive.<\/p>\n<p>So, without further ado, I wrote the following script, which I called, naturally, <i>zipgrep<\/i>. It uses <i>grep<\/i> and <i>unzip<\/i>, which it assumes to be available on the PATH. \u00a0Not wanting to have to pick through the argument list, I decided to mark the end of arguments to <i>grep<\/i> with the traditional &#8216;<b>&#8212;<\/b>&#8216;, after which I could stack up as many zip file names as I liked.<\/p>\n<p><!--more-->It was a case of not enough time in the library; or Google, in this case. \u00a0As soon as I had it working, I discovered the original <i><b><a href=\"http:\/\/www.info-zip.org\/mans\/zipgrep.html\">zipgrep<\/a><\/b><\/i>.<\/p>\n<p>All was not lost. \u00a0The original\u00a0<i>zipgrep<\/i> handles a single archive using <i>egrep<\/i> and <i>unzip<\/i>, with the nice wrinkle of optional sets of filenames to include in, or exclude from, the search. \u00a0However, I liked the ability to search multiple zip archives, and grep can be converted to any of its relatives with an appropriate flag, so I decided to hang on to <i>son of zipgrep<\/i>. \u00a0All I needed was a new name: hence <i><b><a href=\"http:\/\/pbw.id.au\/bin\/zargrep\">zargrep<\/a><\/b><\/i>.<\/p>\n<p>You can retrieve it <a href=\"http:\/\/pbw.id.au\/bin\/zargrep\">here<\/a>. It has been tested on OS X against multiple EPUB3 files.<\/p>\n<p>Because they are zip files, this should also work for jar files, but I haven&#8217;t yet tried it.<\/p>\n<pre> #! \/bin\/sh  \n   \n # Greps files in a zip archive.  \n # Same argument sequence as for grep, except that  \n # zip file arguments must be separated from flags and  \n # patterns by --. If no -- is found in the argument list, returns error.  \n   \n usage() {  \n   echo Usage: &gt;&amp;2  \n   echo $0 \"&lt;grep flags&gt; &lt;pattern&gt; -- zipfiles ...\" &gt;&amp;2  \n }  \n   \n declare -a args  \n   \n i=0  \n for (( i=0; $# &gt; 0; i++ ))  \n do  \n   if [ \"$1\" != \"--\" ]; then  \n     args[$i]=\"$1\"  \n     shift  \n   else  \n     filesmarked=1  \n     shift  \n     break  \n   fi  \n done  \n   \n if [ -z \"$filesmarked\" ]; then  \n   Echo \"No '--' marker for zipfiles args.\" &gt;&amp;2  \n   usage  \n   exit 1  \n fi  \n   \n tmpfile=\/tmp\/zipgrep$$  \n rm -rf $tmpfile  \n mkdir $tmpfile  \n   \n trap 'rm -rf $tmpfile' EXIT  \n   \n wd=$(pwd)  \n cd $tmpfile  \n   \n while [ $# -gt 0 ]; do  \n   zipfile=\"$1\"  \n   zfile=\"$1\"  \n   shift  \n   # If zipfile is not absolute, set it relative to wd  \n   if [ \"${zipfile:0:1}\" != \/ ]; then  \n     zipfile=\"$wd\/${zipfile}\"  \n   fi  \n   unzip \"$zipfile\" &gt;\/dev\/null  \n   result=$(find . -type f -print0|xargs -0 grep \"${args[@]}\")  \n   if [ -n \"$result\" ]; then  \n     echo \"zip: $zfile\"  \n     echo \"$result\"  \n   fi  \n   cd $wd  \n   rm -rf $tmpfile  \n   mkdir $tmpfile  \n   cd $tmpfile  \n done  \n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>How do you search for strings within a zip archive? I&#8217;m tinkering with EPUB3 files, and I wanted to be able to find certain strings within .epub files, so I had a look around, and I immediately found zgrep and family. The trouble was that zgrep assumes a single zipped file, not an archive. So, &hellip; <a href=\"https:\/\/pbw.id.au\/blog\/2014\/06\/zargrep-grep-files-in-a-zip-archive\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;zargrep: grep files in a zip archive&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[19],"tags":[],"class_list":["post-33","post","type-post","status-publish","format-standard","hentry","category-code"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8SCfl-x","jetpack-related-posts":[{"id":341,"url":"https:\/\/pbw.id.au\/blog\/2017\/01\/find-files-only-with-scm-directory-pruning\/","url_meta":{"origin":33,"position":0},"title":"find: files only with scm directory pruning","author":"admin","date":"Mon 9th Jan '17","format":false,"excerpt":"The version of find I'm discussing here is find (GNU findutils) 4.7.0-git I use this pattern frequently\u2014 $ find . <conditions> |xargs grep <pattern> to find files containing, say, a regular expression. \u00a0If the search tree contains mercurial or git directories, I usually want to exclude their contents from the\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":30,"url":"https:\/\/pbw.id.au\/blog\/2015\/05\/tikadiff-graphical-diff-for-text-from-binary-files\/","url_meta":{"origin":33,"position":1},"title":"tikadiff: graphical diff for text from &#8220;binary&#8221; files","author":"pbw","date":"Sat 9th May '15","format":false,"excerpt":"Code The code is from the Downloads area of my Atlassian Bitbucket repository; see the README online. Version Control Systems (VCSs) VCSs like mercurial, git and bazaar (to mention only a few) are great for keeping track of changes to source files, but their utility doesn't stop there. \u00a0If you're\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":38,"url":"https:\/\/pbw.id.au\/blog\/2013\/05\/using-environment-plist-with-mountain-lion\/","url_meta":{"origin":33,"position":2},"title":"Using environment.plist with Mountain Lion","author":"pbw","date":"Sat 4th May '13","format":false,"excerpt":"UPDATE This post is now obsolete. For the preferred method in both Mountain Lion and Mavericks, see\u00a0 Setting environment variables in OS X Mountain Lion and Mavericks. With Mountain Lion (OS X 10.8), the environment settings from ~\/.MacOSX\/environment.plist are not taken into account when the background system environment is set\u2026","rel":"","context":"In &quot;Observations&quot;","block_context":{"text":"Observations","link":"https:\/\/pbw.id.au\/blog\/category\/personal\/observations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":31,"url":"https:\/\/pbw.id.au\/blog\/2015\/03\/help-for-digest-checking\/","url_meta":{"origin":33,"position":3},"title":"Help for digest checking","author":"pbw","date":"Fri 20th Mar '15","format":false,"excerpt":"Updated 2018-02-14 It's pretty important to check the digests of software you download. \u00a0When a downloaded file is accompanied by a signature file, for example a gnupg .asc file, you can verify the signature with various tools. \u00a0Often though, a download site will include the MD5 or SHA1 digest hash\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":34,"url":"https:\/\/pbw.id.au\/blog\/2013\/11\/setting-environment-variables-in-os-x-yosemite-and-mavericks\/","url_meta":{"origin":33,"position":4},"title":"Setting environment variables in  MacOS Big Sur","author":"pbw","date":"Thu 7th Nov '13","format":false,"excerpt":"This method uses launchctl to manage environment variables for programs invoked directly from Finder. \u00a0See the launchctl man page, especially the section LEGACY SUBCOMMANDS. \u00a0It's not entirely accurate, but that's not unusual. \u00a0The critical subcommands are getenv, setenv, and unsetenv. The man page indicates that the export subcommand is available;\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":36,"url":"https:\/\/pbw.id.au\/blog\/2013\/05\/ant-process-elements-in-a-list\/","url_meta":{"origin":33,"position":5},"title":"Ant: process elements in a list","author":"pbw","date":"Sun 19th May '13","format":false,"excerpt":"I was looking for a way to process a list of items in an ant build file, similar to what you would do in Java with a construct like: for ( Element element : elements ) { \/\/ do stuff with element } The approach of XSLT, using recursive calls\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/33","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/comments?post=33"}],"version-history":[{"count":5,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/33\/revisions"}],"predecessor-version":[{"id":567,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/33\/revisions\/567"}],"wp:attachment":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/media?parent=33"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/categories?post=33"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/tags?post=33"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}