Sunday, 23 February 2014

Python and Excel/VBA

Today we are going to talk about how easy it could be run Python's scripts from Excel, which might be considered pretty interesting. Basically we will create a VBA code which launches a console running the selected script. This script creates an output which must be printed on the same excel file from where it was launched.

Continuing with the previous post, we will use the scraping script to print the extracted data into excel. First we will change a bit the previous code and we will create a class called "collect". UPDATE: The website has changed and the previous code doesn't work any more!

import numpy as np
import urllib2
import urlparse
import re
from bs4 import BeautifulSoup


class collect:

    def __init__(self, mainSites):
        self.sites = mainSites

    def connect(self, site):
        hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)',
                'Accept': 'text/html,application/xhtml+xml,application/xml'
                ';q=0.9,*/*;q=0.8',
               'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
               'Accept-Encoding': 'none',
               'Accept-Language': 'en-US,en;q=0.8',
               'Connection': 'keep-alive'}
        req = urllib2.Request(site, headers=hdr)
        page = urllib2.urlopen(req).read()
        soup = BeautifulSoup(page)
        return soup

    def extractUrls(self):
        FxUrls = []
        C1 = []
        C2 = []
        for mainSite in self.sites:
            soup = self.connect(mainSite)
            for link in soup.find_all('td', {'class': 'bold left noWrap'}):
                path = 'http://www.investing.com/currencies/'
                url = link.find(href=re.compile(path)).get('href')
                FxUrls.append(url)
                curr = urlparse.urlparse(url).path.split("/")[2].split("-")
                C1.append(curr[0])
                C2.append(curr[1])
        return np.vstack((FxUrls, C1, C2))

    def extractData(self):
        Value = []
        Data = self.extractUrls()
        for link in Data[0, :]:
            soup = self.connect(link)
            for span in soup.find_all('span', {'id': 'last_last'}):
                value = (span.text).replace(',', '')
                Value.append(value)
        return np.vstack((Data, Value))
Once we have the script is time to create the interaction with Excel. In order to do this operation we will use an excellent module: pyvot. I have previously prepared some special ranges for printing the data (I recommend Name ranges as a good Excel/VBA practice). The script below will create a connection with Excel, just selecting the desired ranges and sending the values. This script will also need some arguments, the workbook name from where the script is launched and our links.

import sys
import scraping
import xl

#arguments:
excelWb = sys.argv[1]
print "Excel workbook: ", excelWb

mainSites = []
for i in range(len(sys.argv) - 2):
    mainSites.append(str(sys.argv[i + 2]))
print "urls: ", mainSites

scr = scraping.collect(mainSites)
data = scr.extractData()

rngCurr1 = data[1, :]
rngCurr2 = data[2, :]
rngValue = data[3, :]

"""Excel connection"""
wb = xl.Workbook(excelWb)
wb.range("rngCurr1").set(rngCurr1)
wb.range("rngCurr2").set(rngCurr2)
wb.range("rngRate").set(rngValue)

Excel template

Ranges

Finally, we need a VBA code which can run a script through the Windows console. First we will be calling a function to clear ranges contents and the main function afterwards. The main function, called runScript, creates an object Shell and runs the script with its arguments, which are: workbook name and array of urls.

Option Explicit

Sub clean()

Dim rngForex As Range
Set rngForex = Sheet1.Range("rngForex")
rngForex.ClearContents

End Sub

Sub runScript()

Dim oShell As Object
Dim sScript As String, sScriptName As String, sWbName As String
Dim sArguments As String
Dim iRegions As Integer, i As Integer
Dim aRegion() As String, sLink As String

Call clean

Set oShell = CreateObject("WScript.Shell")

sWbName = ThisWorkbook.Name
sScriptName = "scrapingtest.py"
sLink = "http://www.investing.com/currencies/"

iRegions = Application.WorksheetFunction.CountA(Sheet1.Range("rngRegions"))
ReDim aRegion(iRegions - 1)
For i = 1 To iRegions
    aRegion(i - 1) = sLink & CStr(Sheet1.Range("rngRegions").Cells(i, 1).Value)
    sArguments = sArguments & aRegion(i - 1) & " "
Next i

sScript = sScriptName & " " & sWbName & " " & sArguments
oShell.Run ("cmd.exe /S /C " & sScript)

End Sub
Now, if you click "RUN" a console will appear showing a message similar to this one:


Conclusion:
My personal opinion is that at this moment there are not real alternatives to Excel. Is true that there are open source tools like LibreOffice or Apache OpenOffice but these are not even comparable to Excel. It is quite common create scripts in C# to automate processes which are not so fast in VBA. VS languages (VB /C#) are chosen because they are much more integrated with Excel and generally speaking with most of the Microsoft applications.

The aim of this post was show that we can use Python to accelerate the code and simply use Excel as a template where print our results, creating dashboards, reports and so on. Thus, for me Python and Excel/VBA is a good association and I will use it going forward.

No comments:

Post a Comment