Using REST API in SSIS to Extract Top 100 User Tweets

Abstract

The growing popularity of the use of social networks by businesses indicate a possible change in ETL requirements. Part of this change is that developers had to revise existing ETL model such that it caters for sourcing of data off the internet in addition to the traditional flat files or OLE/ODBC Connections. In this article, I will demonstrate extracting user tweets using Twitter’s REST API inside an HTTP Connection Manager.

Requirements

Article

Let’s begin by launching Microsoft Visual Studio and create a new Integration Services Project which is located under Business Intelligence Projects category. (If you are using SQL Server 2012 or later, you will have to launch SQL Server Data Tools – SSDT).

After you have assigned a project name, proceed to click and drag the Data Flow Task (DFT) into Control Flow pane from toolbox (SSIS Toolbox in SSDT). I decided to name my DFT as DFT – Retrieve Tweets

Add the following two connections:

  • HTTP Connection Manager
  • OLE DB Connection

I have configured my HTTP Connection as follows:

  • Server URL = http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=mafiswana&count=100

Note that you can replace the value of parameter screen_name with your twitter user account instead of mafiswana.

I have configured my OLE DB Connection as follows:

  • Server = Localhost
  • Database Name = selectSifiso

So far the package should look as below:

Let’s go back and edit the Data Flow Task.

Add a Script Component (SC) which is located under Data Flow Transformations.

Select Source as a script component type.

Edit the script SC and click on the Connection Managers tab on the left of the Script Transformation Editor. Under the Connection Manager column click to select the HTTP Connection Manager (it should be the only connection available).

Let’s move on to the Inputs and Outputs tab in the Script Transformation Editor. In here we will configure the source output columns.

Collapse Output 0 node and click on Output Columns. Click Add Column and the following columns with properties configured as below:

  • Name = sS_Tweets; DataType = string [DT_STR]; Length = 200
  • Name = sS_TwitterDate; DataType = string [DT_STR]; Length = 200

Now let’s move on to the Script tab in the Script Transformation Editor.

In Object Explorer, right click on References and Click Add. Under .NET tab add the following references:

  • System.ServiceModel
  • System.ServiceModel.Web

If one or none of above references are not available for selection, please ensure that you have .NET 3.5 Framework selected. You can do that by clicking on Properties – Application – Target Framework – .NET Framework 3.5

In addition to the already referenced namespaces, add the following namespaces:

  • using System.Xml;
  • using System.ServiceModel.Syndication;

Declare the following variables in the public class ScriptMain : UserComponent

  • private SyndicationFeed sS_Tweets = null;
  • private XmlReader sS_XmlReader = null;

Under the PreExecute() method, add the following:

  • sS_XmlReader = XmlReader.Create(Connections.Connection.ConnectionString);
  • sS_Tweets = SyndicationFeed.Load(sS_XmlReader);

Under the public override void CreateNewOutputRows() method, add the following:
if (sS_Tweets != null)

{

foreach (var item in sS_Tweets.Items)

{

Output0Buffer.AddRow();

Output0Buffer.sS_Tweets= item.Title.Text;

Output0Buffer.sS_TwitterDate= item.PublishDate.ToString();

}

Output0Buffer.SetEndOfRowset();

}

The complete script code can be found here – tweets_ssis

Now that we have configured the Script Component as Source let us add an OLE DB Destination Component (ODD) and connect Script Component to the Destination Component.

I have configured the OLE DB connection manager of the ODD to use the selectSifiso connection. The data access mode is a Table or view – fast load. Under Name of the table or the view, click New and create and output table. Click on Mappings tab and ensure that the source-to-destination mappings are correct.

Your complete package data flow task should look as follows:

Conclusion

All left to do is to run the package and your tweets will be extracted and stored in SQL Server.

It’s that simple.

Till next time folks, cheers.

Sifiso


 

2 Comments

  1. O Ruiz says:

    I followed the steps and when I run the package, I get this message:

    [Script Component [2]] Error: System.Xml.XmlException: The element with name ‘html’ and namespace ” is not an allowed feed format.
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.HandleUserException(Exception e)
    at Microsoft.SqlServer.Dts.Pipeline.ScriptComponentHost.PreExecute()
    at Microsoft.SqlServer.Dts.Pipeline.ManagedComponentHost.HostPreExecute(IDTSManagedComponentWrapper100 wrapper)

    Any ideas?

  2. Leonard says:

    I am extremely impressed along with your writing abilities as well as with
    the layout on your weblog. Is that this a paid topic
    or did you customize it yourself? Either
    way stay up the nice quality writing, it is rare to look a great weblog like this one today.

    .

Leave a comment

%d bloggers like this: