May 21, 2013

Internet Explorer Automation Part 4


I this article, I will explain how to extract statistics from Blogger stats page. This follows the previous article in which you learned how to automate the login process and get the stats page.

The stats page is organized in a number of HTML elements. The one which is interesting for us is a table. Since there are many tables in the page, I had to find out a way to detect the correct one, even if the page layout changes.

The idea is to enumerate all the HTML elements in the page, check for the table tag and check the labels against string constants. This is easy since the table is organized in two columns, one with the label and one with the number.

To iterate all the HTML elements is easy. As we saw in previous article, there is a property of the HTML document which is a collection (a kind of array) name “all”. It is enough to enumerate it and for each item in the collection query the interface IID_IHTMLElement to get hand on the HTML element.

Having the HTML element, we can check the tagName property which is actually the tag type. We are looking for ‘Table’. If it is a table, we get the innertext property which as its name implies is the raw text inside the tag. Raw text means it is the text without any embedded tags. In the case of an HTML table, we get the content of all table cells. We will search that text for the labels such as “Pageviews today” and then extract the number just after. For that purpose I wrote a little utility function I will show you in a moment.

Here is the code to do what I’ve just described:
    Coll := FHtmlDoc.all;
    for I := 0 to Coll.Length - 1 do begin
        pDisp := Coll.item(I, var2);
        if pDisp.QueryInterface(IID_IHTMLElement, HtmlElem) = S_OK then begin
            if SameText(HtmlElem.tagName, 'TABLE') then begin
                Txt := String(HtmlElem.innertext);
                if not ExtractNumberAfterText(Txt, TxtToday,
                                              CountToday) then
                    continue;
                if not ExtractNumberAfterText(Txt, TxtYesterday,
                                              CountYesterday) then
                    continue;
                if not ExtractNumberAfterText(Txt, TxtLastMonth,
                                              CountLastMonth) then
                    continue;
                if not ExtractNumberAfterText(Txt, TxtAllTime,
                                              CountAllTime) then
                    continue;

                Buf := AnsiString(
                          FormatDateTime('YYYY/MM/DD;HH:NN:SS;', Now) +
                          '"_' + FBlogId + '";' +
                          IntToStr(CountToday) + ';' +
                          IntToStr(CountYesterday) + ';' +
                          IntToStr(CountLastMonth) + ';' +
                          IntToStr(CountAllTime));
                Result := TRUE;
                break;
            end;
        end;
    end;

The utility function ExtractNumberAfterText is rather simple. It is just simple Delphi code to parse the string. We just have to pay attention to skip all spaces and line breaks because they are not significant in HTML.

function ExtractNumberAfterText(
    const Source : String;
    const Text   : String;
    out   Number : Integer) : Boolean;
var
    J : Integer;
begin
    Result := FALSE;
    Number := 0;
    J := Pos(Text, Source);
    if J <= 0 then
        Exit;
    // Search for first digit right after searched text,
    // ignore anything not a digit
    J := J + Length(Text);
    while (J <= Length(Source)) and
          (not CharInSet(Source[J], ['0'..'9'])) do
        Inc(J);
    // After first digit, scan all digit and ',' or '.' (which
    // are used as thousand separator (Depends on language, any will do)
    repeat
        // If we have a digit, use it to build the final number
        if CharInSet(Source[J], ['0'..'9']) then
            Number := Number * 10 + Ord(Source[J]) - Ord('0');
        Inc(J);
    until (J > Length(Source)) or
          (not CharInSet(Source[J], ['0'..'9', ',', '.']));
    Result := TRUE;
end;

About the design of the application


I explained how to automate Internet Explorer. I showed the actual code used. But I didn’t gave any explanation about how I have designed the whole application.

I always like to separate the user interface from data processing. For that purpose, I created two source files: one with the user interface and one with a class having the automation code.

My user interface is very basic: a simple form with a memo showing messages about what is going on. I could as well write a console mode application or a service application. This doesn’t really matters.

My data processing code is encapsulated in a class I named TQueryBloggerStatistics. It explains what it does. The class is a kind of container. It exposes a few methods and properties to permit what has to be done with that kind of automation.

The class declaration is as follow:

    TQueryBloggerStatistics = class
    private
        FWebBrowser   : IWebBrowser2;
        FBlogID       : String;
        FUserEMail    : String;
        FUserPassword : String;
        FLogFileName  : String;
        FVisible      : Boolean;
        FOnDisplay    : TDisplayEvent;
        function WaitComplete(const URL : String = ''): IHTMLDocument2;
        function FindTag(const Coll    : IHTMLElementCollection;
                         const TagName, TagID: String): IHTMLElement;
        procedure Display(const Msg : String);
    public
        constructor Create;
        function  Execute : Boolean;
        procedure Quit;
        procedure LoadConfig(const IniFileName : String); overload;
        procedure LoadConfig; overload;
        function  SaveConfig(const IniFileName: String) : Boolean; overload;
        function  SaveConfig : Boolean; overload;
        property  BlogID       : String        read  FBlogID
                                               write FBlogID;
        property  UserEMail    : String        read  FUserEMail
                                               write FUserEMail;
        property  UserPassword : String        read  FUserPassword
                                               write FUserPassword;
        property  LogFileName  : String        read  FLogFileName
                                               write FLogFileName;
        property  Visible      : Boolean       read  FVisible
                                               write FVisible;
        property  OnDisplay    : TDisplayEvent read  FOnDisplay
                                               write FOnDisplay;
    end;
I won’t reproduce the implementation here because I already showed most interesting part. You can download the full source code for the class and the complete demo application from my website at:
http://www.overbyte.be/frame_index.html?redirTo=/blog_source_code.html

Previous article: http://francois-piette.blogspot.be/2013/05/internet-explorer-automation-part-3.html

Follow me on Twitter
Follow me on LinkedIn
Follow me on Google+
Visit my website: http://www.overbyte.be

No comments: