27 Apr 2019 - by 'Maurits van der Schee'
The PHP 'explode' function splits a string into an array based on a separator character (or separator string). This is not enough to build a parser for a template language on as most languages allow strings to contain any character. In this post we will show a function that will split while respecting quotes and one to remove the quotes while allowing for escaped quotes as part of the string.
Write a function or program that can split a string at each non-escaped occurrence of a separator character.
It should accept three input parameters:
It should output a list of strings. (source)
The input string:
"one^|uno||three^^^^|four^^^|^cuatro|"
Should result in an array of 5 strings:
[ "one|uno", "", "three^^", "four^|cuatro", "" ]
In this example the '^' is the escape character and the '|' is the separator.
<?php function token_with_escape($str, $escape = '^', $separator = '|') { $tokens = []; $token = ''; $escaped = false; for ($i = 0; $i < strlen($str); $i++) { $c = $str[$i]; if (!$escaped) { if ($c == $escape) { $escaped = true; } elseif ($c == $separator) { $tokens[] = $token; $token = ''; } else { $token .= $c; } } else { $token .= $c; $escaped = false; } } $tokens[] = $token; return $tokens; } $input = "one^|uno||three^^^^|four^^^|^cuatro|"; $output = token_with_escape($input); echo json_encode($output) . "\n";
And it does in fact output the right string.
Write a function or program that can split a string at each occurrence of a separator character that is not within non-escaped quotes.
It should accept four input parameters:
It should output a list of strings.
You need to avoid splitting within a 'strings between quotes'. So you want:
"'one|uno'||'three^'^''|'four^^^'^cuatro'|"
to be split into (step 1):
[ "'one|uno'", "", "'three^'^''", "'four^^^'^cuatro'", "" ]
and to be parsed into (step 2):
[ "one|uno", "", "three''", "four^'cuatro", "" ]
As you can see you never split within a quoted string.
This function will take care of the first step:
<?php function token_with_quote($str, $quote = "'", $escape = '^', $separator = '|') { $tokens = []; $token = ''; $escaped = false; $quoted = false; $seplen = strlen($separator); for ($i = 0; $i < strlen($str); $i++) { $c = $str[$i]; if (!$quoted) { if ($c == $quote) { $quoted = true; } elseif (substr($str, $i, $seplen) == $separator) { $tokens[] = $token; $token = ''; $i += $seplen - 1; continue; } } else { if (!$escaped) { if ($c == $quote) { $quoted = false; } elseif ($c == $escape) { $escaped = true; } } else { $escaped = false; } } $token .= $c; } $tokens[] = $token; return $tokens; } $input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|"; $output = token_with_quote($input); echo json_encode($output) . "\n";
This function will take care of the second step:
function token_unquote($arr, $quote = "'", $escape = '^') { for ($i = 0; $i < count($arr); $i++) { $str = trim($arr[$i]); if (strlen($str) > 1 && $str[0] == $quote && $str[strlen($str) - 1] == $quote) { $escaped = false; $token = ''; $str = substr($str, 1, strlen($str) - 2); for ($j = 0; $j < strlen($str); $j++) { $c = $str[$j]; if (!$escaped) { if ($c == $escape) { $escaped = true; continue; } } else { $escaped = false; } $token .= $c; } $arr[$i] = $token; } } return $arr; } $input = "'one|uno'||'three^'^''|'four^^^'^cuatro'|"; $output = token_unquote(token_with_quote($input)); echo json_encode($output) . "\n";
And as expected the output is parsed correctly.
Enjoy!
PS: Liked this article? Please share it on Facebook, Twitter or LinkedIn.