一只单身猿


open_in_new

一只 C# 版本的正方教务系统爬虫

alarm2016-06-03 faceyejinmo

注:此方法仅适用于拥有 default6.aspx 页面的正方教务系统,即无验证码的页面

我们打开 http://教务网站/default6.aspx ,可以看到此页面登录不需要验证码,也就是说我们只需要模拟登陆,成功获取到 cookie ,便可一步步的走下去抓出所有所需的数据。

20160603231821

首先使用抓包工具分析,得出POST出去的数据(精力有限此部分不做介绍,可使用浏览器自带的F12或WireShark抓包分析)

在登录的POST分析中,有两个值值得我们关注: __VIEWSTATE 以及 ASP.NET_SessionId

其中 __VIEWSTATE 隐藏在网页中,使用正则即可轻松获取

而 ASP.NET_SessionId 则由服务器生成,当我们请求数据的时候,直接提取 ASP.NET_SessionId 即可

首先我们添加所需引用:

using System;
using System.IO;
using System.Net;
using System.Text;
using System.Text.RegularExpressions;

打开首页获取 cookie 及 __VIEWSTATE 核心部分代码:

                string cookie = string.Empty;
                string __VIEWSTATE = string.Empty;
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(@"http://" + HOST_IP + "/default6.aspx");
                request.CookieContainer = new CookieContainer();
                request.Referer = @"http://" + HOST_IP + "/default6.aspx";
                request.Accept = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";
                request.Headers["Accept-Language"] = "zh-CN,zh;q=0.";
                request.Headers["Accept-Charset"] = "GBK,utf-8;q=0.7,*;q=0.3";
                request.UserAgent = "User-Agent:Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1";
                request.KeepAlive = true;
                request.ContentType = "application/x-www-form-urlencoded";
                request.Method = "GET";
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                Stream myResponseStream = response.GetResponseStream();
                StreamReader myStreamReader = new StreamReader(myResponseStream, Encoding.GetEncoding("utf-8"));
                string retString = myStreamReader.ReadToEnd();
                Regex regResult__VIEWSTATE = new Regex("name=\"__VIEWSTATE\" value=\"(.+?)\"");
                MatchCollection mcResul__VIEWSTATEt = regResult__VIEWSTATE.Matches(retString);
                Match mc__VIEWSTATEt = mcResul__VIEWSTATEt[0];
                __VIEWSTATE = mc__VIEWSTATEt.Groups[1].Value;
                cookie = "ASP.NET_SessionId=" + response.Headers["Set-Cookie"];
                Regex regResultSessionId = new Regex("ASP.NET_SessionId=(.+?); path=/");
                MatchCollection mcResulSessionId = regResultSessionId.Matches(cookie);
                Match mc = mcResulSessionId[0];
                cookie = mc.Groups[1].Value;
                myStreamReader.Close();
                myResponseStream.Close();
                if (string.IsNullOrEmpty(cookie) || string.IsNullOrEmpty(__VIEWSTATE))
                    return ReturnResult("Login Error");

当我们成功获取 cookie 及 __VIEWSTATE 便可进行模拟登陆操作

其中我们需要对所需POST的数据进行格式化操作

                string postDataStr = string.Format("__VIEWSTATE={0}&tname=&tbtns=&tnameXw=yhdl&tbtnsXw=yhdl%7Cxwxsdl&txtYhm={1}&txtXm={2}&txtMm={2}&rblJs=%D1%A7%C9%FA&btnDl=%B5%C7+%C2%BC", __VIEWSTATE.Replace("+", "%2B").Replace("=", "%3D").Replace("/", "%2F"), username, password);

模拟登陆操作核心部分代码:

                string postDataStr = string.Format("__VIEWSTATE={0}&tname=&tbtns=&tnameXw=yhdl&tbtnsXw=yhdl%7Cxwxsdl&txtYhm={1}&txtXm={2}&txtMm={2}&rblJs=%D1%A7%C9%FA&btnDl=%B5%C7+%C2%BC", __VIEWSTATE.Replace("+", "%2B").Replace("=", "%3D").Replace("/", "%2F"), username, password);
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://" + HOST_IP + "/default6.aspx");
                request.Method = "POST";
                request.Referer = "http://" + HOST_IP + "/default6.aspx";
                request.Host = HOST_IP;
                request.Headers["Origin"] = "http://" + HOST_IP;
                request.Headers["Upgrade-Insecure-Requests"] = "1";
                request.Accept = "Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
                request.Headers["Accept-Language"] = "zh-CN,zh;q=0.8";
                request.Headers["Accept-Charset"] = "GBK,utf-8;q=0.7,*;q=0.3";
                request.Headers["Cookie"] = cookie;
                request.UserAgent = "User-Agent:Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1";
                request.KeepAlive = true;
                request.ContentType = "application/x-www-form-urlencoded";
                request.ContentLength = postDataStr.Length;
                Stream myRequestStream = request.GetRequestStream();
                StreamWriter myStreamWriter = new StreamWriter(myRequestStream, Encoding.GetEncoding("gb2312"));
                myStreamWriter.Write(postDataStr);
                myStreamWriter.Close();
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                if (!response.ResponseUri.ToString().EndsWith(username))
                    return ReturnResult("username or password error.");

如果一切顺利,此时我们已经成功登录,服务器已将此 cookie 标记为有效登录状态,我们只要带着此 cookie 请求数据便可一路绿灯

需要注意的是, [Referer] 标头,此标头代表着网页的跳转源是哪里,通过此地址来做出一些浏览限制

篇幅有限,请点此下载完整代码

此完整代码为CGI程序,具体用法自行百度吧 = =
无非就是传参返回数据了。。。

渣渣

2016-06-19 22:58

Youmustbe loggedin to post a comment.