{"id":341,"date":"2017-01-09T12:49:03","date_gmt":"2017-01-09T02:49:03","guid":{"rendered":"http:\/\/www.pbw.id.au\/blog\/?p=341"},"modified":"2018-05-22T20:48:53","modified_gmt":"2018-05-22T10:48:53","slug":"find-files-only-with-scm-directory-pruning","status":"publish","type":"post","link":"https:\/\/pbw.id.au\/blog\/2017\/01\/find-files-only-with-scm-directory-pruning\/","title":{"rendered":"find: files only with scm directory pruning"},"content":{"rendered":"<p>The version of find I&#8217;m discussing here is<br \/>\n<code>find (GNU findutils) 4.7.0-git<\/code><br \/>\nI use this pattern frequently\u2014<br \/>\n<code>$ find . &lt;conditions&gt; |xargs grep &lt;pattern&gt;<\/code><br \/>\nto find files containing, say, a regular expression. \u00a0If the search tree contains mercurial or git directories, I usually want to exclude their contents from the search.<\/p>\n<p><!--more-->The <em>-prune<\/em> action prevents a search from descending into the pruned directory, but I also want to strip out all directories, because the filenames are being fed into <em>xargs grep<\/em>. \u00a0So the command feeding the <em>xgrep<\/em> looks something like\u2014<br \/>\n<code>$ find . -type d -name .hg -prune -o -type f -print<\/code><br \/>\nThis works well. \u00a0All directory names are suppressed, along with the files contained in the <em>.hg<\/em> directory.<\/p>\n<p>Because the default action on a find is <em>-print<\/em>, I often elide that action, so I end up with\u2014<br \/>\n<code>$ find . -type d -name .hg -prune -o -type f<\/code><br \/>\nLo and behold, the name of the pruned <em>.hg<\/em> directory appears in the list of files passed to <em>xargs<\/em>. \u00a0All other directory names are suppressed.<\/p>\n<p>What seems to be going on is this: the condition before the <em>-o<\/em> finds only directories named <em>.hg<\/em>. \u00a0Those it prunes, but the condition returns the names of the pruned directories. \u00a0The condition following the <em>-o<\/em> filters out all of the directories <strong>not<\/strong> named <em>.hg<\/em>. \u00a0The combined list of files and <em>.hg<\/em> directories (but not their contents) is passed to <em>xargs<\/em>.<\/p>\n<p>So how is it that the first version works as I want it to? \u00a0How are the names of the <em>.hg<\/em> directories suppressed?<\/p>\n<p>All I can surmise is that, in the absence of a specific <em>-print<\/em>\u00a0action, the default <em>-print<\/em> applies to each of the conditions, but when it is specifically applied to the <em>-o<\/em> conditions, the default is suppressed for the initial conditions.<\/p>\n<p>The man page says:<br \/>\n<code>If the whole expression contains no actions other than -prune or -print, -print is performed on all files for which the whole expression is true.<\/code><br \/>\nThat is ambiguous; in fact, it seems to be false. Neither of the versions above contain any actions apart from <em>-prune<\/em> and <em>-print<\/em>, and one doesn&#8217;t even contain a <em>-print<\/em>. \u00a0Yet they behave differently.<\/p>\n<p>The command<br \/>\n<code>$\u00a0find . \\( -type d -name .hg -prune -o -type f \\) -print<\/code><br \/>\nwhich pops the -print out from within the sub-expression, behaves the same as<br \/>\n<code>$ find . -type d -name .hg -prune -o -type f<\/code><br \/>\nso that seems to be what is effectively happening in the absence of a specific <em>-print<\/em> on the <em>or<\/em> condition. (Incidentally, you seem to be able to use <em>-or<\/em> in place of just <em>-o<\/em>.)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The version of find I&#8217;m discussing here is find (GNU findutils) 4.7.0-git I use this pattern frequently\u2014 $ find . &lt;conditions&gt; |xargs grep &lt;pattern&gt; to find files containing, say, a regular expression. \u00a0If the search tree contains mercurial or git directories, I usually want to exclude their contents from the search.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[19],"tags":[],"class_list":["post-341","post","type-post","status-publish","format-standard","hentry","category-code"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8SCfl-5v","jetpack-related-posts":[{"id":33,"url":"https:\/\/pbw.id.au\/blog\/2014\/06\/zargrep-grep-files-in-a-zip-archive\/","url_meta":{"origin":341,"position":0},"title":"zargrep: grep files in a zip archive","author":"pbw","date":"Sat 21st Jun '14","format":false,"excerpt":"How do you search for strings within a zip archive? I'm tinkering with EPUB3 files, and I wanted to be able to find certain strings within .epub files, so I had a look around, and I immediately found zgrep and family. The trouble was that zgrep assumes a single zipped\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":30,"url":"https:\/\/pbw.id.au\/blog\/2015\/05\/tikadiff-graphical-diff-for-text-from-binary-files\/","url_meta":{"origin":341,"position":1},"title":"tikadiff: graphical diff for text from &#8220;binary&#8221; files","author":"pbw","date":"Sat 9th May '15","format":false,"excerpt":"Code The code is from the Downloads area of my Atlassian Bitbucket repository; see the README online. Version Control Systems (VCSs) VCSs like mercurial, git and bazaar (to mention only a few) are great for keeping track of changes to source files, but their utility doesn't stop there. \u00a0If you're\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":37,"url":"https:\/\/pbw.id.au\/blog\/2013\/05\/ant-edit-property-values\/","url_meta":{"origin":341,"position":2},"title":"Ant: edit property values","author":"pbw","date":"Thu 9th May '13","format":false,"excerpt":"One of the frustrations of using ant\u00a0was the difficulty of deriving one property value performing some sort of editing operation on an existing property value. The mapper task does a lot of grunt work for file names, but not for property values as such. A common requirement is to map\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":38,"url":"https:\/\/pbw.id.au\/blog\/2013\/05\/using-environment-plist-with-mountain-lion\/","url_meta":{"origin":341,"position":3},"title":"Using environment.plist with Mountain Lion","author":"pbw","date":"Sat 4th May '13","format":false,"excerpt":"UPDATE This post is now obsolete. For the preferred method in both Mountain Lion and Mavericks, see\u00a0 Setting environment variables in OS X Mountain Lion and Mavericks. With Mountain Lion (OS X 10.8), the environment settings from ~\/.MacOSX\/environment.plist are not taken into account when the background system environment is set\u2026","rel":"","context":"In &quot;Observations&quot;","block_context":{"text":"Observations","link":"https:\/\/pbw.id.au\/blog\/category\/personal\/observations\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":36,"url":"https:\/\/pbw.id.au\/blog\/2013\/05\/ant-process-elements-in-a-list\/","url_meta":{"origin":341,"position":4},"title":"Ant: process elements in a list","author":"pbw","date":"Sun 19th May '13","format":false,"excerpt":"I was looking for a way to process a list of items in an ant build file, similar to what you would do in Java with a construct like: for ( Element element : elements ) { \/\/ do stuff with element } The approach of XSLT, using recursive calls\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":34,"url":"https:\/\/pbw.id.au\/blog\/2013\/11\/setting-environment-variables-in-os-x-yosemite-and-mavericks\/","url_meta":{"origin":341,"position":5},"title":"Setting environment variables in  MacOS Big Sur","author":"pbw","date":"Thu 7th Nov '13","format":false,"excerpt":"This method uses launchctl to manage environment variables for programs invoked directly from Finder. \u00a0See the launchctl man page, especially the section LEGACY SUBCOMMANDS. \u00a0It's not entirely accurate, but that's not unusual. \u00a0The critical subcommands are getenv, setenv, and unsetenv. The man page indicates that the export subcommand is available;\u2026","rel":"","context":"In &quot;Code&quot;","block_context":{"text":"Code","link":"https:\/\/pbw.id.au\/blog\/category\/code\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/341","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/comments?post=341"}],"version-history":[{"count":4,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/341\/revisions"}],"predecessor-version":[{"id":562,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/posts\/341\/revisions\/562"}],"wp:attachment":[{"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/media?parent=341"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/categories?post=341"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pbw.id.au\/blog\/wp-json\/wp\/v2\/tags?post=341"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}