[转]淘宝数据抓取简记

作者:stoat 发布时间:2014-07-08 分类:技术

淘宝商品详情页面下方有如下script:

<script> window.App = (window.App || {}); window.App.data = { images: [ "http://q.i02.wimg.taobao.com/bao/uploaded/i3/T1O2eYXjdsXXXQBm.Y_030602.jpg" , "http://q.i01.wimg.taobao.com/bao/uploaded/i1/T1QMCxXi0fXXcyDbPb_093241.jpg" , "http://q.i02.wimg.taobao.com/bao/uploaded/i2/670343779/T29dyXXchbXXXXXXXX_!!670343779.png" , "http://q.i02.wimg.taobao.com/bao/uploaded/i3/670343779/T2419rXdXbXXXXXXXX_!!670343779.png" , "http://q.i03.wimg.taobao.com/bao/uploaded/i2/T1eYO0XXhkXXaJ60na_120515.jpg" ], link: "http://a.m.tmall.com/i9642784141.htm", price: "¥202.00", tmall: true, itemId: 9642784141, hasProps: false, taoPlus: true, imgScale: 1, reAddCart: false , reAddFav: false , prevSkuId: "", logAjaxUrl: "ajax/pds.do", descAjaxUrl: "http://a.m.tmall.com/ajax/desc_list.do?item_id=xxx&ps=800&sid=6c4d94e0e89bca18", propsAjaxUrl: "http://a.m.tmall.com/ajax/sku.do?item_id=xxx&sid=6c4d94e0e89bca18", reviewAjaxUrl: "http://a.m.tmall.com/ajax/rate_list.do?item_id=xxx&sid=6c4d94e0e89bca18", loginUrl: "http://login.m.taobao.com/login.htm?tpl_redirect_url=http%3A%2F%2Fa.m.tmall.com%2Fi9642784141.htm%3Fsid%3D6c4d94e0e89bca18%26pds%3Dfromauc%2523h%2523shop&sid=6c4d94e0e89bca18", addFavUrl: "http://fav.m.taobao.com/favorite/to_collection.htm?itemNumId=xxx&xid=0db2&pds=addfav%23h%23detail&sid=6c4d94e0e89bca18", addCartUrl: "http://cart.m.taobao.com/ajax.do?fun=add&item_id=xxx&ticket=6c4d94e0e89bca18&pds=addcart%23h%23detail&sid=6c4d94e0e89bca18", cleannowUrl: "http://cart.m.taobao.com/my_cart.htm?pds=cleannow%23h%23cart&sid=6c4d94e0e89bca18", myCartUrl: "http://cart.m.taobao.com/my_cart.htm?sid=6c4d94e0e89bca18", recommendAjaxUrl: "http://a.m.tmall.com/ajax/get_related.do?item_id=xxx&sid=6c4d94e0e89bca18" } </script><script src="http://a.tbcdn.cn/mw/app/detail/h5/detail.min.js"></script>
reviewAjaxUrl: "http://a.m.tmall.com/ajax/rate_list.do?item_id=xxx

这个即为获取评论数据的Ajax链接,item_id为商品的编号。但是,直接输入上述网址只能获取少量评论数据。于是,问题就转到了如何获取链接后的其他参数。
打开http://a.tbcdn.cn/mw/app/detail/h5/detail.min.js,里面乱糟糟一片,啥也看不明白。找个js整理工具吧。

通过Google搜索,找到这个在线工具:http://jsbeautifier.org

仔细阅读了下,大致了解了脉络。搜索:reviewAjaxUrl,搜索到的函数体内有如下内嵌函数:

getData: function (b, c) { var d = this; this.xhr && this.xhr.abort(), this.xhr = a.ajax({ url: this.url, data: { rateRs: this.typeMap[b], p: c, ps: 10 }, dataType: "json", success: function (a) { d.type = b, d.data[b].index = c, d.data[b].total = a && a.total || 0, d.data[b].pages[c] = a && d._convert(a.items || []), d.render(b, c) }, error: function () {} }) },

这里的data中的3个参数rateRs, p, ps即为要查找的参数。
加上参数试试吧:http://a.m.tmall.com/ajax/rate_list.do?item_id=xxxx&p=2&ps=15

获取的内容如下(JSON格式):

{"index":2,"items":[{"annoy":0,"buyer":"xxx","credit":91,"date":"2012-07-17","deal":"","rateId":19890633089,"text":"发货超快,经济实惠,以后就买你家产品了","type":0},{"annoy":0,"buyer":"xxx","credit":91,"date":"2012-07-16","deal":"","rateId":19859269704,"reply":"感谢您对我们的肯定和支持,非常感谢您对我们的服务认可,衷心的希望您能常来我们店!!!!o(∩_∩)o","text":"不好意思啊,确认晚了,发货速度很快,客服的态度也很好,特别是小依,有什么问题总能很快的得到答复.这已经是第三次购买了.","type":0},{"annoy":0,"buyer":"xxx","credit":501,"date":"2012-07-14","deal":"","rateId":19823225928,"reply":"亲,十一坊的纯的乳清蛋白粉溶解度是很好的,如果您说的其它品牌溶解速度快,那是因为那款产品里面加了速溶剂,而速溶剂属于添加剂呀,所以希望你能理解这点。另外十一坊纯乳清蛋白粉的原料是新西兰原产地纯牛奶提炼的,不知道怎么亲会闻出来羊奶味,我们有相关资质可以作证的,希望亲在好好闻闻哈!","text":"蛋白粉是我用过的的,外观最细腻的,泡沫最多的一款蛋白粉,同样也是奶味最浓的一款蛋白粉,有很浓的羊奶膻味,,(注意不是牛奶,不知为什么,)但是和美国大品牌蛋白粉比起来溶解度差,美国大品牌都是见水吉化,这款差点","type":0},{"annoy":0,"buyer":"xxx","credit":11,"date":"2012-07-14","deal":"","rateId":19816990927,"text":"好","type":0},{"annoy":0,"buyer":"xxx","credit":91,"date":"2012-07-13","deal":"","rateId":19801085374,"reply":"亲,多多关注我们店铺哦,还有更多优惠进行中~~~","text":"第二次购买了,不错的卖家。以后还会回来买。","type":0},{"annoy":0,"buyer":"xxx09","credit":11,"date":"2012-07-12","deal":"","rateId":19756966488,"reply":"感谢您对我们的肯定和支持,衷心的希望您能常来我们店!","text":"很好,有赠品和挂奖卡带金卡","type":0},{"annoy":0,"buyer":"xxx","credit":91,"date":"2012-07-11","deal":"","rateId":19726634975,"reply":"非常感谢您对店的支持~~~","text":"hao dong xi","type":0},{"annoy":0,"buyer":"xxx","credit":251,"date":"2012-07-09","deal":"","rateId":19678002719,"reply":"感谢您对我们的肯定和支持,衷心的希望您能常来我们店!","text":"很好,超划算,下次再来,谢谢小礼物","type":0},{"annoy":0,"buyer":"xxx","credit":251,"date":"2012-07-09","deal":"","rateId":19675522722,"text":"第一次来,卖家服务真好,还送了礼物,哈,还中了三等奖,别然小礼品一份,但也可看出卖家的用心,好评,会常来。","type":0},{"annoy":1,"buyer":"a**0","credit":41,"date":"2012-07-09","deal":"","rateId":19673848752,"text":"妈妈吃了,感觉还可以。坚持吃看看效果了哇。谢谢卖家的晓礼物哦","type":0},{"annoy":0,"buyer":"xxx","credit":41,"date":"2012-07-09","deal":"","rateId":19670631439,"reply":"亲,多多关注我们店铺哦,还有更多优惠进行中~~~","text":"一如即往的好,老顾客了","type":0},{"annoy":0,"buyer":"xxxx","credit":282,"date":"2012-07-06","deal":"","rateId":19611318847,"reply":"亲们若有疑问请拨打十一坊免费营养咨询热线:800 888 9988 ,非常感谢您对店的支持,祝您天天好心情~~","text":"发货速度很快,物流也很好。虽没查证,但感觉是正品,买家服务态度也很好,落发了一个摇摇杯说要给我寄过来,正在吃希望有效果,有效果再来喽!全5分好评了。。。。。","type":0},{"annoy":1,"buyer":"3**王","credit":4,"date":"2012-07-06","deal":"","rateId":19607636363,"text":"可以","type":0},{"annoy":0,"buyer":"xxx","credit":152,"date":"2012-07-06","deal":"","rateId":19596872786,"text":"蛋白粉一直在吃非常好","type":0},{"annoy":0,"buyer":"xxx","credit":152,"date":"2012-07-03","deal":"","rateId":19522938766,"text":"谢谢送的小赠品,还没喝不过看上去还不错","type":0}],"total":9}
经过几次更改参数,得出如下结论:
p表示page;

ps表示page size;

结果中的total为总页数(随ps的不同而不同).

rateRs应该为评价结果,取值如下:all, good, ok, bad, 1, 0, -1. 但是加上这个参数没有起作用,不知什么原因。

至此,可以完整的实现一个商品评价数据的抓取了,可以分页抓取。

--------------------------------------------------------------------------
update:
淘宝商品属性抓取(综合以下两个):
http://a.m.tmall.com/ajax/param.do?item_id=xx
http://a.m.tmall.com/ajax/sku.do?item_id=xx
--------------------------------------------------------------------------

原文地址:[转]淘宝数据抓取简记 by 雪鼬博客

标签:淘宝

评论已关闭