<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Valentin's Lab &#187; nul</title>
	<atom:link href="https://vaab.blog.kal.fr/tag/nul/feed/" rel="self" type="application/rss+xml" />
	<link>https://vaab.blog.kal.fr</link>
	<description>Ratiocination of an opensource techie</description>
	<lastBuildDate>Thu, 15 Nov 2018 08:04:35 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.1</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=vaab&amp;popout=1&amp;url=https%3A%2F%2Fvaab.blog.kal.fr%2F&amp;language=en_US&amp;category=text&amp;title=Valentin%27s+Lab&amp;description=Ratiocination+of+an+opensource+techie&amp;tags=blog" type="text/html" />
	<item>
		<title>bash lore: how to properly parse NUL separated fields</title>
		<link>https://vaab.blog.kal.fr/2015/01/03/bash-lore-how-to-properly-parse-nul-separated-fields/</link>
		<comments>https://vaab.blog.kal.fr/2015/01/03/bash-lore-how-to-properly-parse-nul-separated-fields/#comments</comments>
		<pubDate>Sat, 03 Jan 2015 09:26:28 +0000</pubDate>
		<dc:creator><![CDATA[vaab]]></dc:creator>
				<category><![CDATA[dev]]></category>
		<category><![CDATA[tip]]></category>
		<category><![CDATA[bash]]></category>
		<category><![CDATA[nul]]></category>

		<guid isPermaLink="false">http://vaab.blog.kal.fr/?p=513</guid>
		<description><![CDATA[As a lot of other part of bash, this is black magic. Lets suppose a friendly command that spits out NUL separated fields (as find -print0, shyaml get-values-0, ...). Which - may I insist - is the recommended way to &#8230;<p class="read-more"><a href="https://vaab.blog.kal.fr/2015/01/03/bash-lore-how-to-properly-parse-nul-separated-fields/">Read more &#187;</a></p>]]></description>
				<content:encoded><![CDATA[
<div class="document">


<!-- -*- mode: rst -*- -->
<p>As a lot of other part of bash, this is black magic.</p>
<p>Lets suppose a friendly command that spits out <tt class="docutils literal">NUL</tt> separated fields (as <tt class="docutils literal">find <span class="pre">-print0</span></tt>, <tt class="docutils literal">shyaml <span class="pre">get-values-0</span></tt>, ...). Which - may I insist - is the recommended way to communicate wild binary data in a solid way in bash.</p>
<p>How would you parse adequately each individual records by paquets ?</p>
<p>For the purpose of demonstration, lets use the fixed content of a simple <tt class="docutils literal">data.bin</tt> file
as our <tt class="docutils literal">NUL</tt>-separated input:</p>
<pre class="literal-block">
cat &lt;&lt;EOF | tr : &quot;\000&quot; &gt; /tmp/data.bin
a:1:b:2 3:c:4
  5:d:6\n7:e::f:9
EOF
</pre>
<p>Let's verify that we have our <tt class="docutils literal">NUL</tt> bytes:</p>
<pre class="literal-block">
$ cat /tmp/data.bin | hexdump -v -e '/1 &quot;%02X &quot;'
61 00 31 00 62 00 32 20 33 00 63 00 34 0A 20 20 35 00 64 00 36 5C 6E 37 00 65 00 00 66 00 39 0A
</pre>
<p>You have noticed that we have some values containing:</p>
<blockquote>
<ul class="simple">
<li>spaces (hex: <tt class="docutils literal">20</tt>),</li>
<li>line breaks (hex: <tt class="docutils literal">0A</tt>),</li>
<li>a <tt class="docutils literal">\</tt> followed by a <tt class="docutils literal">n</tt>.</li>
<li>a 0 sized value</li>
<li>a final value <tt class="docutils literal">9</tt> ending with a <tt class="docutils literal">0a</tt> and no final <tt class="docutils literal">00</tt>.</li>
</ul>
</blockquote>
<p>If using NUL separated fields is recommended, it's to support this kind of data.</p>
<p>I want the implementation of a function <tt class="docutils literal"><span class="pre">read-0</span></tt> that would allow this type of interaction:</p>
<pre class="literal-block">
$ cat /tmp/data.bin | while read-0 f1 f2; do
    echo &quot;f1: '$f1', f2: '$f2'&quot;
  done
f1: 'a', f2: '1'
f1: 'b', f2: '2 3'
f1: 'c', f2: '4
  5'
f1: 'd', f2: '6\n7'
f1: 'e', f2: ''
f1: 'f', f2: '9
'
</pre>
<div class="section" id="first-try">
<h3>First try</h3>
<p>Let's be naive, and we'll use <tt class="docutils literal">read f1 f2</tt>:</p>
<pre class="literal-block">
$ cat /tmp/data.bin | while read f1 f2; do echo &quot;f1: '$f1', f2: '$f2'&quot;; done
f1: 'a1b2', f2: '3c4'
f1: '5d6n7ef9', f2: ''
</pre>
<p>You can notice that:</p>
<blockquote>
<ul class="simple">
<li><tt class="docutils literal">NUL</tt> char where ignored for field separation</li>
<li>fields where separated upon <strong>consecutive</strong> space or return, it uses value stored in <tt class="docutils literal">IFS</tt> environment variable.</li>
<li>their are only 2 lines because the <tt class="docutils literal">\n</tt> was used to separate each record. We should use <tt class="docutils literal"><span class="pre">-d</span></tt> to specify the line delimiter.</li>
<li>Note that the <tt class="docutils literal">NUL</tt> chars are also extracted out of the data as variables don't support the NUL char.</li>
<li>The <tt class="docutils literal">\</tt> was eaten, because <tt class="docutils literal">read</tt> builtin parse and give it special meaning. We should use <tt class="docutils literal"><span class="pre">-r</span></tt> to avoid that.</li>
</ul>
</blockquote>
<p>But how should we provide the <tt class="docutils literal">NUL</tt> delimiter to the read builtin ? knowing that you can't put <tt class="docutils literal">NUL</tt> chars on the command line ? Hopefully I stumbled onto this blog post: <a class="reference external" href="http://transnum.blogspot.sg/2008/11/bashs-read-built-in-supports-0-as.html">http://transnum.blogspot.sg/2008/11/bashs-read-built-in-supports-0-as.html</a></p>
<p>Conclusion is that <tt class="docutils literal"><span class="pre">-d</span> ''</tt> should be understood magically by bash <tt class="docutils literal">read</tt> builtin to delimit lines with <tt class="docutils literal">NUL</tt>
characters.</p>
</div>
<div class="section" id="better-try">
<h3>Better try</h3>
<p>Let's apply our new acquired knowledge by trying <tt class="docutils literal"><span class="pre">IFS=$'\0'</span> read <span class="pre">-d</span> '' <span class="pre">-r</span> f1 f2</tt>:</p>
<pre class="literal-block">
$ cat /tmp/data.bin | while IFS=$'\0' read -d '' -r f1 f2; do echo &quot;f1: '$f1', f2: '$f2'&quot;; done
f1: 'a', f2: ''
f1: '1', f2: ''
f1: 'b', f2: ''
f1: '2 3', f2: ''
f1: 'c', f2: ''
f1: '4
  5', f2: ''
f1: 'd', f2: ''
f1: '6\n7', f2: ''
f1: 'e', f2: ''
f1: '', f2: ''
f1: 'f', f2: ''
</pre>
<p>That's much better. But notice that:</p>
<blockquote>
<ul class="simple">
<li>we didn't get anything in <tt class="docutils literal">$f2</tt>, that's normal: by specifying <tt class="docutils literal">NUL</tt> as line delimiter (with <tt class="docutils literal"><span class="pre">-d</span> ''</tt>) and having NUL as field delimiter (<tt class="docutils literal">IFS</tt>) we will be doomed to have one field per record. We will need to manage the repacking in a <tt class="docutils literal">while</tt> loop. This doesn't sound too difficult.</li>
<li>where's the final field <tt class="docutils literal">0A</tt> ? Hum, as there is no <tt class="docutils literal">NUL</tt> final character in the data, <tt class="docutils literal">read</tt> returned errlvl 1 on this last field but filled correctly the variable. A simple <tt class="docutils literal">echo $f1</tt> prints <tt class="docutils literal">9</tt> (if you use this form: <tt class="docutils literal">while <span class="pre">IFS=''</span> read <span class="pre">-d</span> '' <span class="pre">-r</span> f1 f2; do echo &quot;f1: '$f1', f2: <span class="pre">'$f2'&quot;;</span> done &lt; /tmp/data.txt</tt> to access variables of the <tt class="docutils literal">while</tt>).</li>
</ul>
</blockquote>
</div>
<div class="section" id="final-implementation">
<h3>Final Implementation ?</h3>
<p>So knowing this, here is the final implemetation of <tt class="docutils literal"><span class="pre">read-0</span></tt>:</p>
<pre class="literal-block">
read-0() {
    local eof
    eof=
    while [ &quot;$1&quot; -a -z &quot;$eof&quot; ]; do
        IFS='' read -r -d '' &quot;$1&quot; || eof=true
        shift
    done
    test &quot;$eof&quot; != true -o -z &quot;$1&quot;
}
</pre>
<p>Final ? It surely properly works for our specification test. But what happens if <tt class="docutils literal">EOF</tt> happens before
we have fed all the variables ?:</p>
<pre class="literal-block">
$ echo -n &quot;a&quot; | while read-0 f1 f2; do echo &quot;f1: '$f1', f2: '$f2'&quot;; done
$
</pre>
<p>Nothing is spit out, despite the fact that we have sent a character.</p>
<p>This is now a specification issue: Do we want <tt class="docutils literal"><span class="pre">read-0</span></tt> to return errorlevel 0 when it hits <tt class="docutils literal">EOF</tt>
while filing the variables ? Okay, but we said 0-sized string was a possible value... <tt class="docutils literal"><span class="pre">read-0</span></tt> in the
current specification knows it hit <tt class="docutils literal">EOF</tt> while filling variables as your first variable can be the
0-sized string. We could make a special case, but I want to be able to distinguish a last empty element from an element.</p>
<p>That <tt class="docutils literal"><span class="pre">read-0</span></tt>, in the actual specification, can't do it. But we can offer a slight change in the way you build your while loop to allow that parsing.</p>
</div>
<div class="section" id="correct-implementation">
<h3>Correct Implementation</h3>
<p>To fill partial records, will need another specification change as current implementation will fail whenever it encounters EOF. This is an incompatible specification issue. Aside from this, we need also to take care to actually set the value of the remaining fields to the empty string. This will require to use another version of <tt class="docutils literal"><span class="pre">read-0</span></tt>:</p>
<pre class="literal-block">
read-0() {
    local eof
    eof=
    while [ &quot;$1&quot; ]; do
        IFS='' read -r -d '' -- &quot;$1&quot; || eof=true
        shift
    done
    test &quot;$eof&quot; != true
}
</pre>
<p>So this would work with <tt class="docutils literal"><span class="pre">read-0</span></tt>:</p>
<pre class="literal-block">
$ echo -n a | tr :  '\000' | {  eof= ; while [ -z $eof ]; do read-0 f1 f2 || eof=true; echo &quot;f1: '$f1', f2: '$f2'&quot;; done  }
f1: 'a', f2: ''

$ echo -n a: | tr :  '\000' | {  eof= ; while [ -z $eof ]; do read-0 f1 f2 || eof=true; echo &quot;f1: '$f1', f2: '$f2'&quot;; done  }
f1: 'a', f2: ''
</pre>
<p>Basically, this construct allows a last round in the loop after detecting EOF... and achieve the starting spec:</p>
<pre class="literal-block">
$ cat /tmp/data.bin | {  eof= ; while [ -z $eof ]; do read-0 f1 f2 || eof=true; echo &quot;f1: '$f1', f2: '$f2'&quot;; done  }
f1: 'a', f2: '1'
f1: 'b', f2: '2 3'
f1: 'c', f2: '4
  5'
f1: 'd', f2: '6\n7'
f1: 'e', f2: ''
f1: 'f', f2: '9
'
</pre>
<p>Trivial ?</p>
<p>Happy hacking.</p>
</div>
</div>
 <p><a href="https://vaab.blog.kal.fr/?flattrss_redirect&amp;id=513&amp;md5=57730888592c27e128ffadf2e707ab79" title="Flattr" target="_blank"><img src="https://vaab.blog.kal.fr/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>https://vaab.blog.kal.fr/2015/01/03/bash-lore-how-to-properly-parse-nul-separated-fields/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=vaab&amp;popout=1&amp;url=https%3A%2F%2Fvaab.blog.kal.fr%2F2015%2F01%2F03%2Fbash-lore-how-to-properly-parse-nul-separated-fields%2F&amp;language=en_GB&amp;category=text&amp;title=bash+lore%3A+how+to+properly+parse+NUL+separated+fields&amp;description=As+a+lot+of+other+part+of+bash%2C+this+is+black+magic.+Lets+suppose+a+friendly+command+that+spits+out+NUL+separated+fields+%28as+find+-print0%2C+shyaml+get-values-0%2C+...%29.+Which...&amp;tags=bash%2Cnul%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
