如果您使用BeautifulSoup获取< script>的内容.标签,
json module可以做一些字符串魔术的休息:
jsonValue = '{%s}' % (textValue.split('{', 1)[1].rsplit('}', 1)[0],)
value = json.loads(jsonValue)
上面的.split()和.rsplit()组合分割了JavaScript文本块中的第一个{和最后一个}上的文本,该文本块应该是您的对象定义.通过将括号添加到文本中,我们可以将其添加到json.loads()并从中获取一个python结构.
示范:
>>> import json
>>> textValue = '''
... var page_data = {
... "default_sku" : "SKU12345",
... "get_together" : {
... "imageLargeURL" : "http://null.null/pictures/large.jpg",
... "URL" : "http://null.null/index.tmpl",
... "name" : "Paints",
... "description" : "Here is a description and it works pretty well",
... "canFavorite" : 1,
... "id" : 1234,
... "type" : 2,
... "category" : "faded",
... "imageThumbnailURL" : "http://null.null/small9.jpg"
... }
... };
... '''
>>> jsonValue = '{%s}' % (textValue.split('{', 1)[1].rsplit('}', 1)[0],)
>>> value = json.loads(jsonValue)
>>> value
{u'default_sku': u'SKU12345', u'get_together': {u'category': u'faded', u'canFavorite': 1, u'name': u'Paints', u'URL': u'http://null.null/index.tmpl', u'imageThumbnailURL': u'http://null.null/small9.jpg', u'imageLargeURL': u'http://null.null/pictures/large.jpg', u'type': 2, u'id': 1234, u'description': u'Here is a description and it works pretty well'}}
>>> import pprint
>>> pprint.pprint(value)
{u'default_sku': u'SKU12345',
u'get_together': {u'URL': u'http://null.null/index.tmpl',
u'canFavorite': 1,
u'category': u'faded',
u'description': u'Here is a description and it works pretty well',
u'id': 1234,
u'imageLargeURL': u'http://null.null/pictures/large.jpg',
u'imageThumbnailURL': u'http://null.null/small9.jpg',
u'name': u'Paints',
u'type': 2}}