Hand-rolling valid JSON in MySQL using GROUP_CONCAT
Posted: May 12th, 2015 | Author: Alex Korn | Filed under: Tutorials | Tags: JSON, multi-byte safe, mysql, PHP, string escaping, unicode | No Comments »The problem
Recently, I came across what started as a straightforward problem: In MySQL I was JOINing from Table A to Table B on a’s primary key, and wanted to display a list of all items in Table A, each followed by a list of all associated items in Table B.
My initial thought was to hand-roll my own JSON, so that the result of the query would have a few columns with the results of Table A, then a column with the JSON-encoded data from Table B. I realize that one should never hand-roll their own JSON, but there isn’t a native JSON encoder in MySQL, so I had to make do.
It was easy enough to come up with a basic JSON format using the following. I’ve done some pretty aggressive tabbing and newlines here to try to make the layers of functions as straightforward as possible; different arguments to each function align vertically.
SELECT a.aId, a.name, CONCAT( '[', GROUP_CONCAT( CONCAT( '[', b.bId, ', ', '\"', b.name, '\"', ']') ORDER BY b.bId ASC SEPARATOR ','), ']' ) AS bData FROM a JOIN b ON a.bId = b.bId GROUP BY a.aId |
(Note that I am not the first person to come up with this solution.)
However, if b.name
has a quotation mark (or a variety of other characters), this creates invalid JSON. How do we ensure that we always create valid JSON, even when there are special characters or unicode/multi-byte characters?
The solution
Ensuring we always have valid JSON was surprisingly easy – we can hexadecimal-encode b.name
in the MySQL query, which ensures that its character set is 0-9 or A-F, so it will always be properly contained within the quotation marks around it. We then convert it back into a normal string in the application code.
SELECT a.aId, a.name, CONCAT( '[', COALESCE( GROUP_CONCAT( CONCAT( '[', b.bId, ', ', '\"', HEX(b.name), '\"', ']') ORDER BY b.bId ASC SEPARATOR ','), ''), ']') AS bData FROM a JOIN b ON a.bId = b.bId GROUP BY a.aId |
So let’s say b.name
was something like '☃'
(because I know people are always naming their things [unicode Snowman]), the resulting hexadecimal representation is 27E2988327
. Opaque, but definitely JSON-safe!
Note that we also use COALESCE in case there are no associated items in Table B; otherwise CONCAT('[', NULL, ']'
gives us NULL
, not []
.
All we have left to do is convert the hexadecimal representation back into text in the application. Here’s a function in PHP that does the trick:
function hexToStr($hex) { $string = ''; for ($charIter = 0; $charIter < strlen($hex) - 1; $charIter += 2) { $string .= chr(hexdec($hex[$charIter] . $hex[$charIter + 1])); } return $string; } |
Happy coding, and remember to be safe and always properly escape your strings!
Feel free to ask questions or give feedback via Twitter.