The code I will show you will also take care of authentication. Asking for the stats page without being first authenticated and you get the authentication page instead. The code I’ll present will detect the login page, fill the form automatically, submit it and then request the stats page again and finally extract the data.
For those not accustomed with Blogger author interface and his stats page, the screen dump shows an actual view of the page. It shows the stats for this week (At the time of writing this article). What we are interested in is to get the column on the right showing “Pageviews today 149”, “Pageviews yesterday 434” and the two other lines. This is an HTML table that we have to extract from the document.
As I said above before getting this stats page, you must be authenticated. This means that if you are not authenticated, Blogger will show you the login page whatever you asked in the first place. For your reference, here is a screen dump of the authentication page:
On that page, we see a form with two fields for Email and Password and a button “Sign in” to click. The program will locate those fields, assign a value and then click on the button.
Document Object Model (DOM)
The World Wide Web Consortium (W3C) Document Object Model (DOM) is a platform- and language-neutral interface that permits programs or scripts to access and update the content, structure, and style of a document. The W3C DOM includes a model for how a standard set of objects representing HTML and XML documents are combined, and an interface for accessing and manipulating them.
Internet Explorer exposes DOM thru a set of COM interfaces available to external programs such as our Delphi application. This is documented on MSDN website at:
I will only scratch the surface of DOM. Just enough to get you started and to accomplish the task for the sample application.
We saw in previous article that we can connect to IE by calling this line:
FWebBrowser := CreateComObject(CLASS_InternetExplorer) as IWebBrowser2;
And that we can navigate to an URL with this line of code:
FWebBrowser.Navigate(Url, EmptyParam, EmptyParam, EmptyParam, EmptyParam);
To get hand on the interface which is the entry point for the DOM, we must get the document (whatever it is) and the get the interface to the HTML document (if it exists):
Doc := FWebBrowser.Document; Doc.QueryInterface(IID_IHTMLDocument2, HtmlDoc);
Those code lines are easy but wait! There can be some glitches. Internet Explorer takes some time to fetch URL and build document. A document can be quite complex and could requires a lot of downloads for HTML, images, CSS, scripts and more. And once everything is downloaded, scripts have to be executed. There are various status available to be sure everything is OK. The method WaitComplete here after takes an URL, navigate to it and wait until the HTML document interface is available and the document is ready:
function TQueryBloggerStatistics.WaitComplete( const URL : String = ''): IHTMLDocument2; var Doc : IDispatch; begin Result := nil; if URL <> '' then FWebBrowser.Navigate(Url, EmptyParam, EmptyParam, EmptyParam, EmptyParam); while FWebBrowser.Busy do Sleep(250); while FWebBrowser.Document = nil do Sleep(250); Doc := FWebBrowser.Document; if Doc.QueryInterface(IID_IHTMLDocument2, Result) <> S_OK then Exit; while not SameText(Result.readyState, 'complete') do Sleep(250); end;
WaitComplete takes and optional URL and returns the IHTMLDocument2 interface required for handling the document. Tests are made to be sure everything is ready or complete. The code is quite straightforward but this must be done like that.
Once we’ve got an IHTMLDocument2 interface, we can use it to traverse the document object model (DOM) to find the HTML elements we need and to get or set their properties.
The HTML document has a number of collections like images, links, scripts and the likes. And there is a special collection returning absolutely everything. It is named “all”. We will use it to find what we need. For example, in the login form, we need to get hand on the HTML INPUT tag for each field and submit buttons. Each HTML tag has a TagName such as “input” and a tagID. TagName is an HTML standard while TagID is chosen by the web developer, in this case by Blogger. Fortunately at Blogger, they used very clear and meaningful TagId sucha as “Email” (for the Email input field), “Passwd” (for the password input field) and “Signin” for the submit button.
Since we have to get hand on several HTML elements, I wrote a little function FindTag:
function TQueryBloggerStatistics.FindTag( const Coll : IHTMLElementCollection; const TagName : String; const TagID : String) : IHTMLElement; var PDisp : IDispatch; Var2 : OleVariant; I : Integer; begin for I := 0 to Coll.Length - 1 do begin pDisp := Coll.item(I, var2); if pDisp.QueryInterface(IID_IHTMLElement, Result) = S_OK then begin if SameText(Result.tagName, TagName) and SameText(Result.Id, TagID) then Exit; end; end; Result := nil; end;FindTag has to be called like this:
HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'EMail'); if Assigned(HtmlElem) then HtmlElem.setAttribute('Value', FUserEMail, 0);
This excerpt find tag name “input” tag having an ID “Email”. The result, if found, is the interface to handle that HTML element. Here I use the interface to set the attribute “value” to the user email (variable FUserEMail hold the Email address).
FindTag code is relatively simple although accessing the collection items is a little bit tricky and must pass thru the use of another interface. Sorry but this is how Microsoft designed IE to handle the DOM.
Detecting and handling the login pageThe code I’ll show you below will query a webpage by his URL. Nere this URL is supposed to be the stats page of a given Blogger’s blog. We’ll come back to that URL later. It makes use of WaitComplete to fetch the URL, wait until it is ready and complete and then use FindTag to see it the page conatins an “input” tag with and ID “Email”. If this is the case, then it is assumed we have received the login page. The conde then fetch in cascade all other required tags in that page, fill it with user data and then claa the “Click” method of the HTML element which is the submit button. And guess what… IE will send the form to Blogger and authentication take place.
FHtmlDoc := WaitComplete(URL); if not Assigned(FHtmlDoc) then Exit; // Check for login page // If found, fill in the form and subit it before continuing HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'EMail'); if Assigned(HtmlElem) then begin HtmlElem.setAttribute('Value', FUserEMail, 0); HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'Passwd'); if Assigned(HtmlElem) then begin HtmlElem.setAttribute('Value', FUserPassword, 0); HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'PersistentCookie'); if Assigned(HtmlElem) then HtmlElem.setAttribute('Checked', '', 0); HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'Signin'); if Assigned(HtmlElem) then begin HtmlElem.click; Display('Login...'); // We have found login form and must wait for login to occur FHtmlDoc := WaitComplete; if not Assigned(FHtmlDoc) then Exit; // Login is finished, we must navigate again to the target URL FHtmlDoc := WaitComplete(URL); if not Assigned(FHtmlDoc) then Exit; HtmlElem := FindTag(FHtmlDoc.All, 'INPUT', 'EMail'); if Assigned(HtmlElem) then begin Display('Login failed'); Exit; end; end; end; end;
The next step is to extract the statistics from the stat page.
We will do that in the next article. Stay tuned!
Read also part 1 and part 2.
Follow me on Twitter
Follow me on LinkedIn
Follow me on Google+
Visit my website: http://www.overbyte.be