Sikuli and IPython

At work we have a years worth of time series data from 3 sensors. Most of the data is organised into 30 minutes files and foldered by month. We have some basic unsupported proprietary gui based Windows software called W@ves21. This software does some visualization and processing of the data on a per file basis, with no way to process in batch. I believe these batch services are available as a bespoke paid service.

We came to a situation where we needed to process one months worth of one type of 30 minutes files for each sensor, totalling in excess of 3000 files, this would have been a very laborious task fraught with error.

When collaborating with another organisation we used an application called Sikuli to drive a piece of proprietary software called WaveLab, when setting up multiple runs. It was a successful experiment, Sikuli is based on image recognition so it doesn’t need to understand your application in the way old software automation tools did. It was developed by the AI lab at MIT.

When you are writing a Sikuli script you highlight the area of the screen you want the application to carry out an event on. This needs a bit of a shift in thinking because you need to think about the application in terms of visually unique areas that the image recognition process can understand.

There is an ability to tweak the matching algorithm in terms of fuzzyness with a sliding percentage scale, so if you have lots of similar items with very small differences you can increase the percentage slider have a stricter .

The code is Java based, so run on Mac, Linux and Windows. the scripting is done in Jython so uses Python Syntax. The scripts can be a bit fragile but the more detail you add the more robust they can become.

You need to watch out for taking the scripts onto different machines, some applications take on the native feel of their OS to some extent and this can have an effect on the original images used with the script, if you understand the logic it shouldn’t long to update the script accordingly.

Due to a number of issues, mainly the reliability of the software being driven, the process was not completed for one month at one time. So after completion of the Sikuli script you have approximately 1500 files from one sensor and some of the original files weren’t processed due to user error in loading up the files, the trouble is how do you recognise the few files missing from a sequence of over a 1000?

The file names were time stamped as part of the Sikuli process, each filename consisted of ‘sensor_name, year-month-day hour-minute-second’.

Having dealt a lot with time series files in Python and missing sections. I developed an approach of grabbing the filenames as a list of strings, stripping everything but the date and time and using datetime.strptime to parse a date time object from the string. Then convert the datetime object to a Unix timestamp using time.mktime then you can pass the correctly ordered array of unix timestamps to numpy.diff to get the difference between successive values. Most difference values are roughly the same and you can use a mask to expose the large values and use numpy.where function to find their position you can pass this position back into the original filename array searching forward a few places in the index to find the corresponding gaps. From there you can check if the original files for those time periods exist.

IPython is an invaluable tool for this procedure, the read print eval method of iterative coding is just right for this problem, coupled with the straightforward syntax and string/date/time/numpy/os libraries of Python.

Having IPython on a machine is like having a very powerful easily scriptable terminaI. It is starting to become a must have for myself on whichever box I’m working on. Anaconda Community Edition is a nice distribution of Python with a lot of the useful packages .

I need to work out a way to run IPython notebook reliably at work. I can run it with local access but having an instance which ran across the network securely would be a powerful incentive to encourage other folks to engage with Python.