Is it possible to detect bad quotes in a poorly-formed JSON string and then parse the string as JSON properly? By Admin

I'm using Rails 4.2.3. I'm parsing JSON sent by a third party (I'm not in control of how this JSON gets formed). I noticed taht they are very occasionally sending poorly JSON, like so

    '{"DisplayName":""fat" Tony Elvis ","Time":null,"OverallRank":19,"AgeRank":4}'
Notice in the above, the word "fat", with the quotes, screws up the rest of the JSON. In my Rails code, I parse the JSON, like so ...

    json_data = JSON.parse(content_str)
Although I can catch errors when JSON doesn't parse properly, I'm wondering if there's a way to account for these poorly placed quotes, correct them so that the above string does not constitute bad JSON, and then parse the JSON properly


  •  Open
  •  27-07-2016
  •  4
  •  285

Answers ( 4 )

 Posted on 27-07-2016

If you know exactly what malformations might occur, you might manage to do some crazy workarounds like using regex to match and correct the string before parsing it as json:

(?:")([^,:"]*"[^,:"]*"[^,:"]*)(?:")
http://regexr.com/3dpj1

But this is definitely something you shouldn't do if not absolutely necessary!! You better try to contact the source owner and make him escape the quotes correctly!

edit: Here is a full POC, where unescaped quotes are simply removed: https://jsfiddle.net/MattDiMu/y8khwfw6/

 Posted on 27-07-2016

Using regex you could check before parsing for double quotes \"\" followed by some word \w+ and ending with \". If you find it use gsub to replace the phrase with single quotes and a lookback "\'\\1\'.

t='{"DisplayName":""fat" Tony Elvis ","Time":null,"OverallRank":19,"AgeRank":4}'
t=t.gsub(/\"\"(\w+)\"/, '"\'\\1\'') 

 Posted on 27-07-2016

I think you have to make/have some assumptions about the "json" that are always true. If for example the json objects always have a fixed order of attributes, that could help very much, especially if only single attributes are problematic.

I would try to match

{"DisplayName":"(.*?)","Time":(null|"[^"]*"),"OverallRank":(\d+),"AgeRank":(\d+)}
and then replace with the help of some "fixer" function, that probably just uses the capture groups and re-encodes some ad-hoc created object back into actually valid json. One variation would be, to expand the (.*?) to only match when something is wrong.

However, the whole approach gets more complicated with optional attributes and even more so with a flexible order of attributes (all of which might still be manageable).

As you very well may have noticed, this only works if the assumption at the top is true. Depending on the assumptions you can make, the solution can be very simple. However, it all gets unwieldy, if these malformed Elements are completely irregular. So ... best of luck, I guess. Please post the assumptions about the json you believe to be true, if you need further assistance. If there are none however, a program would have to guess, what's actually meant. I mean, someone could mean:

{
 "

 Posted on 27-07-2016

Try handling error using begin rescue exception handling like,

begin 
 json_data = JSON.parse(content_str)
rescue =>e
 Rails.logger.debug e
end

This will raise exception when there is invalid JSON format and inform source owner to modify the JSON.