Today we are going to talk about how easy it could be run Python's scripts from Excel, which might be considered pretty interesting.
Basically we will create a VBA code which launches a console running the selected script. This script creates an output which must be printed
on the same excel file from where it was launched.
Continuing with the previous post, we will use the scraping script to print the extracted data into excel. First we will change a bit the previous code and we will create a class called "collect". UPDATE: The website has changed and the previous code doesn't work any more!
Finally, we need a VBA code which can run a script through the Windows console. First we will be calling a function to clear ranges contents and the main function afterwards. The main function, called runScript, creates an object Shell and runs the script with its arguments, which are: workbook name and array of urls.
Conclusion:
My personal opinion is that at this moment there are not real alternatives to Excel. Is true that there are open source tools like LibreOffice or Apache OpenOffice but these are not even comparable to Excel. It is quite common create scripts in C# to automate processes which are not so fast in VBA. VS languages (VB /C#) are chosen because they are much more integrated with Excel and generally speaking with most of the Microsoft applications.
The aim of this post was show that we can use Python to accelerate the code and simply use Excel as a template where print our results, creating dashboards, reports and so on. Thus, for me Python and Excel/VBA is a good association and I will use it going forward.
Continuing with the previous post, we will use the scraping script to print the extracted data into excel. First we will change a bit the previous code and we will create a class called "collect". UPDATE: The website has changed and the previous code doesn't work any more!
import numpy as np import urllib2 import urlparse import re from bs4 import BeautifulSoup class collect: def __init__(self, mainSites): self.sites = mainSites def connect(self, site): hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64)', 'Accept': 'text/html,application/xhtml+xml,application/xml' ';q=0.9,*/*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive'} req = urllib2.Request(site, headers=hdr) page = urllib2.urlopen(req).read() soup = BeautifulSoup(page) return soup def extractUrls(self): FxUrls = [] C1 = [] C2 = [] for mainSite in self.sites: soup = self.connect(mainSite) for link in soup.find_all('td', {'class': 'bold left noWrap'}): path = 'http://www.investing.com/currencies/' url = link.find(href=re.compile(path)).get('href') FxUrls.append(url) curr = urlparse.urlparse(url).path.split("/")[2].split("-") C1.append(curr[0]) C2.append(curr[1]) return np.vstack((FxUrls, C1, C2)) def extractData(self): Value = [] Data = self.extractUrls() for link in Data[0, :]: soup = self.connect(link) for span in soup.find_all('span', {'id': 'last_last'}): value = (span.text).replace(',', '') Value.append(value) return np.vstack((Data, Value))Once we have the script is time to create the interaction with Excel. In order to do this operation we will use an excellent module: pyvot. I have previously prepared some special ranges for printing the data (I recommend Name ranges as a good Excel/VBA practice). The script below will create a connection with Excel, just selecting the desired ranges and sending the values. This script will also need some arguments, the workbook name from where the script is launched and our links.
import sys import scraping import xl #arguments: excelWb = sys.argv[1] print "Excel workbook: ", excelWb mainSites = [] for i in range(len(sys.argv) - 2): mainSites.append(str(sys.argv[i + 2])) print "urls: ", mainSites scr = scraping.collect(mainSites) data = scr.extractData() rngCurr1 = data[1, :] rngCurr2 = data[2, :] rngValue = data[3, :] """Excel connection""" wb = xl.Workbook(excelWb) wb.range("rngCurr1").set(rngCurr1) wb.range("rngCurr2").set(rngCurr2) wb.range("rngRate").set(rngValue)
Excel template |
Ranges |
Finally, we need a VBA code which can run a script through the Windows console. First we will be calling a function to clear ranges contents and the main function afterwards. The main function, called runScript, creates an object Shell and runs the script with its arguments, which are: workbook name and array of urls.
Option Explicit Sub clean() Dim rngForex As Range Set rngForex = Sheet1.Range("rngForex") rngForex.ClearContents End Sub Sub runScript() Dim oShell As Object Dim sScript As String, sScriptName As String, sWbName As String Dim sArguments As String Dim iRegions As Integer, i As Integer Dim aRegion() As String, sLink As String Call clean Set oShell = CreateObject("WScript.Shell") sWbName = ThisWorkbook.Name sScriptName = "scrapingtest.py" sLink = "http://www.investing.com/currencies/" iRegions = Application.WorksheetFunction.CountA(Sheet1.Range("rngRegions")) ReDim aRegion(iRegions - 1) For i = 1 To iRegions aRegion(i - 1) = sLink & CStr(Sheet1.Range("rngRegions").Cells(i, 1).Value) sArguments = sArguments & aRegion(i - 1) & " " Next i sScript = sScriptName & " " & sWbName & " " & sArguments oShell.Run ("cmd.exe /S /C " & sScript) End SubNow, if you click "RUN" a console will appear showing a message similar to this one:
Conclusion:
My personal opinion is that at this moment there are not real alternatives to Excel. Is true that there are open source tools like LibreOffice or Apache OpenOffice but these are not even comparable to Excel. It is quite common create scripts in C# to automate processes which are not so fast in VBA. VS languages (VB /C#) are chosen because they are much more integrated with Excel and generally speaking with most of the Microsoft applications.
The aim of this post was show that we can use Python to accelerate the code and simply use Excel as a template where print our results, creating dashboards, reports and so on. Thus, for me Python and Excel/VBA is a good association and I will use it going forward.
No comments:
Post a Comment