I this article, I will explain how to extract statistics from Blogger stats page. This follows the previous article in which you learned how to automate the login process and get the stats page.
The stats page is organized in a number of HTML elements. The one which is interesting for us is a table. Since there are many tables in the page, I had to find out a way to detect the correct one, even if the page layout changes.
The idea is to enumerate all the HTML elements in the page, check for the table tag and check the labels against string constants. This is easy since the table is organized in two columns, one with the label and one with the number.
To iterate all the HTML elements is easy. As we saw in previous article, there is a property of the HTML document which is a collection (a kind of array) name “all”. It is enough to enumerate it and for each item in the collection query the interface IID_IHTMLElement to get hand on the HTML element.
Having the HTML element, we can check the tagName property which is actually the tag type. We are looking for ‘Table’. If it is a table, we get the innertext property which as its name implies is the raw text inside the tag. Raw text means it is the text without any embedded tags. In the case of an HTML table, we get the content of all table cells. We will search that text for the labels such as “Pageviews today” and then extract the number just after. For that purpose I wrote a little utility function I will show you in a moment.
Here is the code to do what I’ve just described:
Coll := FHtmlDoc.all; for I := 0 to Coll.Length - 1 do begin pDisp := Coll.item(I, var2); if pDisp.QueryInterface(IID_IHTMLElement, HtmlElem) = S_OK then begin if SameText(HtmlElem.tagName, 'TABLE') then begin Txt := String(HtmlElem.innertext); if not ExtractNumberAfterText(Txt, TxtToday, CountToday) then continue; if not ExtractNumberAfterText(Txt, TxtYesterday, CountYesterday) then continue; if not ExtractNumberAfterText(Txt, TxtLastMonth, CountLastMonth) then continue; if not ExtractNumberAfterText(Txt, TxtAllTime, CountAllTime) then continue; Buf := AnsiString( FormatDateTime('YYYY/MM/DD;HH:NN:SS;', Now) + '"_' + FBlogId + '";' + IntToStr(CountToday) + ';' + IntToStr(CountYesterday) + ';' + IntToStr(CountLastMonth) + ';' + IntToStr(CountAllTime)); Result := TRUE; break; end; end; end;
The utility function ExtractNumberAfterText is rather simple. It is just simple Delphi code to parse the string. We just have to pay attention to skip all spaces and line breaks because they are not significant in HTML.
function ExtractNumberAfterText( const Source : String; const Text : String; out Number : Integer) : Boolean; var J : Integer; begin Result := FALSE; Number := 0; J := Pos(Text, Source); if J <= 0 then Exit; // Search for first digit right after searched text, // ignore anything not a digit J := J + Length(Text); while (J <= Length(Source)) and (not CharInSet(Source[J], ['0'..'9'])) do Inc(J); // After first digit, scan all digit and ',' or '.' (which // are used as thousand separator (Depends on language, any will do) repeat // If we have a digit, use it to build the final number if CharInSet(Source[J], ['0'..'9']) then Number := Number * 10 + Ord(Source[J]) - Ord('0'); Inc(J); until (J > Length(Source)) or (not CharInSet(Source[J], ['0'..'9', ',', '.'])); Result := TRUE; end;
About the design of the application
I explained how to automate Internet Explorer. I showed the actual code used. But I didn’t gave any explanation about how I have designed the whole application.
I always like to separate the user interface from data processing. For that purpose, I created two source files: one with the user interface and one with a class having the automation code.
My user interface is very basic: a simple form with a memo showing messages about what is going on. I could as well write a console mode application or a service application. This doesn’t really matters.
My data processing code is encapsulated in a class I named TQueryBloggerStatistics. It explains what it does. The class is a kind of container. It exposes a few methods and properties to permit what has to be done with that kind of automation.
The class declaration is as follow:
TQueryBloggerStatistics = class private FWebBrowser : IWebBrowser2; FBlogID : String; FUserEMail : String; FUserPassword : String; FLogFileName : String; FVisible : Boolean; FOnDisplay : TDisplayEvent; function WaitComplete(const URL : String = ''): IHTMLDocument2; function FindTag(const Coll : IHTMLElementCollection; const TagName, TagID: String): IHTMLElement; procedure Display(const Msg : String); public constructor Create; function Execute : Boolean; procedure Quit; procedure LoadConfig(const IniFileName : String); overload; procedure LoadConfig; overload; function SaveConfig(const IniFileName: String) : Boolean; overload; function SaveConfig : Boolean; overload; property BlogID : String read FBlogID write FBlogID; property UserEMail : String read FUserEMail write FUserEMail; property UserPassword : String read FUserPassword write FUserPassword; property LogFileName : String read FLogFileName write FLogFileName; property Visible : Boolean read FVisible write FVisible; property OnDisplay : TDisplayEvent read FOnDisplay write FOnDisplay; end;I won’t reproduce the implementation here because I already showed most interesting part. You can download the full source code for the class and the complete demo application from my website at:
http://www.overbyte.be/frame_index.html?redirTo=/blog_source_code.html
Previous article: http://francois-piette.blogspot.be/2013/05/internet-explorer-automation-part-3.html
Follow me on Twitter
Follow me on LinkedIn
Follow me on Google+
Visit my website: http://www.overbyte.be
No comments:
Post a Comment