Chris Watchman of mobomo.com doing a first headstand with Google Glass. Many thanks to Antonio Zugaldia, Google Glass pioneer at Silica Labs.
Mobomo webinars-now on demand! | learn more.
Chris Watchman of mobomo.com doing a first headstand with Google Glass. Many thanks to Antonio Zugaldia, Google Glass pioneer at Silica Labs.
Most government projects follow fat, waterfall practices. At LeanDC meet up, Barg Upender presented his experience at the GSA in launching minimum viable products, using a customer-driven feature backlog, and managing short one week deployments. You can find the slides here.
Most government projects follow fat, waterfall practices. At LeanDC meet up, Barg Upender presented his experience at the GSA in launching minimum viable products, using a customer-driven feature backlog, and managing short one week deployments.
Thanks to 37signals and the Re-Wired Group for hosting the Switch Workshop in Chicago last week. Jason, Bob, and Chris introduced us to the Jobs-to-be-Done framework and explored what makes us buy the products and services we buy.

Buyers always switch from one product to another. For example, Facebook users didn’t engage in new social behavior; they simply switched from using the phone and email to using Facebook.
In a nutshell, the #JTBD framework focuses on four “progress making” forces. I’ll use Michael Bleigh’s recently funded Divshot as an example.
Two forces drive change:
Two countervailing forces block change:
Divshot has a low barrier to entry and the people who purchase it are also the people likely to actually use it (designers and engineers). If it turns out you don’t enjoy building your UI in the browser you can always go back to your previous tool. Thus, the “anxiety of the new solution” isn’t a big impediment to adoption. For bigger purchases such as custom enterprise software—especially when the purchasing manager and the user are not the same person—overcoming a manager’s anxiety of the new solution can be the greatest challenge to seeing your software adopted.
I’ve only scratched the surface of what we learned in the workshop. Next week I’ll blog about fundamental timelines and interview techniques that will fit perfectly with your existing user interview process. Jason Fried, Bob Moesta, and Chris Spiek are exceptional hosts and teachers. If you have a chance to attend a Switch workshop, it’s worth your time.
Building engaging apps on the small mobile screens is challenging enough, but how do you build "bite-sized" apps for the Google Glass? We collaborated with our partners at dSky9, a publisher of best-in-class apps that encourage outdoor play. We presented several Glass App concepts at the MoDevAsia Disruptathon competition in Hong Kong. We won the award for the ideas with the most "Viral Potential" (voted by the audience). Our apps included:
You can see our slides here.
As I regularly do; look around for new datasets that I can explore and process with Surfiki, I came across the following:
The Open Code - Digital Copy of DC's Laws
As the author Tom MacWright mentions on his site:
"I couldn’t be happier to write that the project to bring DC’s laws into the digital era and put them in everyone’s hands made a big breakthrough: today you can download an unofficial copy of the Code (current through December 11, 2012) from the DC Council’s website. Not only that, but the licensing for this copy is officially CC0, a Creative Commons license that aims to be a globally-effective Public Domain designation."
That sounds like a GREAT invitation from my reading his post, it seems that this was difficult to acquire. He mentions many people, communications and time all working together to make this available to the public. He goes on to mention:
"What else is there to build? A great smartphone interface. Topic-specific bookmarks. Text analysis. Great, instant search. Mirrored archives to everywhere. Printable copies for DIY and for print-on-demand services. And lots more.
We’re not waiting till then to start though: dc-decoded is a project I started today to finish the long-awaited task of bringing DC’s laws into The State Decoded. The openlawdc organization on GitHub is dedicated to working on these problems, and it’s open: you can and should get access if you want to contribute."
As Intridea is a DC based firm, it made perfect sense for us to run this data through our own Data Intelligence Processing Engine; Surfiki. As well, it is a perfect opportunity to introduce the Surfiki Developers API. For which, we are making publicly available as of RIGHT NOW However, we are manually approving users and apps as we move forward. This, to assist us in future scaling and better insight in to the bandwidth required for concurrent and frequent developer operations. I encourage anyone and everyone to create an account. Eventually, all requests will be granted, and will be upon a first come first serve basis.
I think it is best that I first explain how we processed the data from The Open Code - Digital Copy of DC's Laws. Followed by a general introduction in to Surfiki and the Surfiki Developers API.
The initial distribution of The Open Code - Digital Copy of DC's Laws was in Microsoft Word Documents. This may have been updated to a more "ingestion" friendly format by now, although I am not sure. The total number of documents was 51, ranging in size from, 317K to 140MB. You may think, "Hey, that's NOT a lot of data"... Sure, that's true, however I don't think it matters much for this project. From what I gather, it was important to just get the data out there and available, regardless of size. As well, Surfiki doesn't throw fits due to small data or big data anyway.
First order of business was converting these Microsoft Word documents to text. While Surfiki can indeed read through Microsoft Word documents, it generally takes a little longer. Therefore, any preprocessing is a good thing to do. Here is a small Python script that will convert the documents.
#!/usr/bin/env python # -*- coding: utf-8 -*- import glob, re, os f = glob.glob('docs/*.doc') + glob.glob('docs/*.DOC') outDir = 'textdocs' if not os.path.exists(outDir): os.makedirs(outDir) for i in f: os.system("catdoc -w '%s' > '%s'" % (i, outDir + '/' + re.sub(r'.*/([^.]+).doc', r'1.txt', i, flags=re.IGNORECASE)))
With that completed we now have them as text files. I decided to take a peak in to the text files and noticed that there are a lot of "END OF DOCUMENT" lines. For this I assume is representative of singular documents within the larger contextual document. (I know, I know… Genius assumption )
This "END OF DOCUMENT" looks like the following:
For legislative history of D.C. Law 4-171, see Historical and Statutory Notes following § 3-501. DC CODE § 3-502 Current through December 11, 2012 END OF DOCUMENT
And from my initial count script, there are about 19K lines that read: "END OF DOCUMENT". Therefore, I want to split these up in to individual files. The reason for this is I want Surfiki to process these as specific documents for search and trending purposes. Therefore, with the following Python script, I split them in to individual documents. As well, I cleaned out the '§' character. Note: Surfiki uses both structured storage and unstructured storage for all data. The reason behind this is both business purposes as well as redundancy. Business purposes, structured storage allows us to connect with common enterprise offerings, such as SQL Server, Oracle , etc., for data consumption and propagation. As for redundancy, since for a temporal period we persist all data concurrently between the two mediums, where process may abase, we can recover within seconds and workflow can resume unimpeded.
Note: docnames.txt is just a static list of the initial text documents converted from Microsoft Word documents. I chose that method rather than walking the path.
#!/usr/bin/env python # -*- coding: utf8 -*- def replace_all(text, dic): for i, j in dic.iteritems(): text = text.replace(i, j) return text with open('docnames.txt') as f: set_count = 0 for lines in f: filename = str(lines.rstrip()) with open(filename, mode="r") as docfile: file_count = 0 smallfile_prefix = "File_" smallfile = open(smallfile_prefix + str(file_count) + "_" + str(set_count) + '.txt', 'w') for line in docfile: reps = {'§':''} line = replace_all(line, reps) if line.startswith("END OF DOCUMENT"): smallfile.close() file_count += 1 smallfile = open(smallfile_prefix + str(file_count) + "_" + str(set_count) + '.txt', 'w') else: smallfile.write(line) smallfile.close() set_count += 1
After the above processing (few seconds of time), I now have over 19K files, ranging from 600B to 600K, PERFECT! I am ready to push these to Surfiki.
It's important to understand Surfiki works with all types of data as well as locations of data. Web data (Pages, Feeds, Posts, Comments, Facebook, Twitter , etc.) As well it works with static data locations, such as file systems, buckets , etc. Streams and Databases… In this case, we are working with static data; text documents in a cloud storage bucket. Without getting to detailed in to the mechanism that I push these files, on a basic level, they are pushed to the bucket where an agent is watching and once they start arriving the processing begins.
Since this is textual information, the type of processing is important. In this case I want to use the standard set of NLP textual processing within Surfiki. Versus any customized algorithms, such as specific topic based classifiers, or statistical classifiers , etc. The following is what will be processed within Surfiki for this data.
As well, we provide the following for all data.
These will all be available in the Surfiki Developers API
Once you go over to Surfiki Developers API and read through the documentation, you will find how simple it is to use. We are adding datasets on a regular basis so please check back often. As well, our near real-time Surface Web data is always available, as are previously processed data sets. If you have ideas or even data sets we should look at, please just let us know by submitting a comment on this post.
If you want to contact us about using Surfiki within your organization, that would be great as well. We can put it behind your firewall, or operate it in the cloud. It is available for most architectural implementations.
Ethics; heavy :) How fitting is this quote from Ambedkar - "History shows that where ethics and economics come in conflict, victory is always with economics. Vested interests have never been known to have willingly divested themselves unless there was sufficient force to compel them."
Certainly, when discussing Big Data... We are indeed speaking of economics; therefore fitting. This might also explain why the book is less than one hundred pages. ;) I can't help but think this is "too soon". Too soon to be discussing Ethics in Big Data. Without any real hard examples in the wild to illustrate the risks, we are only guessing as to outcomes. It's difficult to establish what is "Ethical" until someone or something is hurt.
Ethics of Big Data - Balancing Risk and Innovation
Let's assume this is a primer, a primer for a larger and more in depth consideration sometime in the future. With that assumption, this book is a decent "quick" read. The book lays out the framework for Big Data risks from the eyes of the business, and should serve quite well as a catalyst for discussion.
As for Ethics, as stated above, I don't have a good perspective as to the range. The range of risk potential, the establishment of normalcy and extremes. So I am left just as I started. I understand that Information Ethics is an important topic, yet, I am unsure what it means within a data social context, nor a personal data privacy context. The book doesn't cover what I consider to be a real exploration of Ethics. Other than some rare tidbits, Example. Instagram in 2012 saying they own your photos and can do what they wish with them; we, the Big Data practitioners group and its consumers, lack causation upon this topic. It is important for you to realize that a large portion of Ethics have natural and innate constraints as defined by society. Therefore, there should only be a few, relatively speaking, areas where Ethics, or the question of Ethics is appropriate for data.
A distinction the book doesn't cover is that of innate privacy or personal privacy at all actually; very little. There is nothing un-ehtical or ethical in regards to innate privacy, as it exists with distinction by default. An example would be our Medical Records. As our records are being continually digitized, it is certainly becoming Big Data. However, what's innate about Ethics in this respect is EVERYONE feels their information is private. Therefore, there isn't a need to address the Ethics. Let's put aside conspiracy theory and potential for abuse as Ethics do not restrain those whom exploit in the first place. Medical Records is only an example. There are, indeed many more.
The question I want to ask is: What Big Data is actually subject to the standardization of Ethics? Which is contrary to the very definition of Ethics. "Ethics is the means in which we explore our personal morality". And I do mean, explore. As, our Ethics are subject to change, throughout history, Ethics is a moving target. Example. Roman Colosseum; Man vs. Beast etc. The books title is superfluous, a better title might have been: Privacy Practices for Big Data, or: Big Data, Business and You...
As you can see it is a passionate subject, as I find myself getting off track. That is a good example of what this book might illicit within yourself; questions of practice and judgement.
Using the example of Instagram and their photo debacle, was it ethical for them to post this information in their Terms of Use? Yes; it certainly was, and they did. While you may not like their decision, they were ethical in my opinion. You have a choice to use their service or not, they were honest in their intent. You may argue that the decision by them to claim ownership of your photos is un-ehtical. Is it? Is it really? They are a service that you choose to use, they have terms that they define, and you, need to abide by them, when choosing to use their service. That certainly seems ethical to myself. I can hear a few of you saying that Ethics should apply even when choice exists. That maybe so for some, but I am unconvinced. I acknowledge that Instagram wouldn't exist without users, and their decisions should be beneficial for themselves and for their users, However, it remains your choice to use their service.
Back to the book. If you want a Primer in regards to Big Data and some questions you should likely be asking yourself as a practitioner, this is a good start. I didn't seek information on the author until after I read the book. In my mind I kept saying, hopefully this author is a philosophy major or I am going to be a little agitated. Indeed the author does have a degree in philosophy. I can keep my criticism to a minimum.
This book was provided free, for purpose of review from O'Reilly Media
Last week, our CEO @since1968 was invited to join the #glassexplorers program (moments before @jkottke). Needless to say, we're really excited about this opportunity and look forward to building some kick-ass apps.
@since1968 You're invited to join our #glassexplorers program. Woohoo! Make sure to follow us - we'll DM in the coming weeks.
— Project Glass (@projectglass) March 29, 2013
We <3 Glass and decided to apply as a team just days before the #ifihadglass deadline. We were under the gun and bulging at the seams with ideas. So we took the "easy" route: make a video. Since Intridea is a 100% remote team, this was actually the exact opposite of easy. However, we managed to pull ourselves together by sheer will and a few simple guidelines. This short gem highlights our Intridea UX team and their willingness to look silly for the greater good.
Thank you, Glass!
Coming from having worked in CV for many years in varying degrees, as well, knowing some rather heavy weights working within it today, I was interested in this book from a review perspective.
Practical Computer Vision with SimpleCV
I found it encouraging that the author did dive right in with some high level informative definitions, common challenges and practical use case. Contrary to my encouragement is that it misses the mark with low level detail, theory and any real in depth explanation upon computer vision itself.
For a beginner, this is a decent title. Be aware though, if you are a beginner, you will need to embrace a quick rhythm and progression throughout. While it may lack in providing "an education in theory", it makes up for with "an education in application".
A few examples within the book did catch my attention such as the XBox Kinect material, which is quite relevant with the buzz surrounding the tech and its accessibility. A bonus here is the escape from Microsoft tools that some may feel make the Kinect undesirable. The clear examples in Python should address concerns with its use for practical training and application outside the Microsoft's Developer Tools ecosystem.
I enjoyed Chapter 7 (Drawing on Images) as this is where I spent a lot of time in the past with imaging annotations for medical applications. The ability to work with layers, objects, lines, etc. The author did a good job with describing the canvas but lacked in the actual drawing sections. The lollipop example was rather crude and SimpleCV's support for drawing is demonstrably more robust.
A lot of time was spent upon histograms which does make sense to me. I believe though, there was too much time spent on it. Which, will likely create some discontent with the quick progression I mentioned in a previous paragraph. I realize it is a vital concept/feature within CV and therefore the attention that was spent on it is good, however, not for this book.
I was disappointed to find there was very little in the book regarding SURF and SIFT, used for feature detection. (Arguably the most prominent CV industry application) Arguments regarding which algorithm's perform better or worse may have prevented their inclusion. As well, While SimpleCV doesn't have an implementation of SIFT, it does for SURF. In practice, within feature detection initiatives one of the two will typically be utilized. The following link is some open source SURF and SIFT with OpenCV work I and a friend worked on a few years back. Specific to feature detection around brand logos. LINK
Generally speaking, when you ask a beginner why they are interested in Computer Vision, most likely the answer will be something along the lines of: Facial Recognition and or Object Recognition. The author does provide examples and lesson within those more interesting facets. Therefore, it hits the mark for the intended reader.
This book was provided free, for purpose of review from O'Reilly Media
I always love traveling to DC. Meeting up with fellow Intrideans is incredibly motivating and satisfying. I had this chance yet again over the last couple days for an event; Open Analytics Summit, whereby Intridea was a sponsor. As well, we had a speaking slot. A few other companies were sponsoring/speaking as well, such as Basho, Elasticsearch and 10Gen.
The morning started pretty slow, I am certain people were around the corner, just not sure which corner that was. It was an intimate setting, only a few tables for vendors/sponsors and a select group of practitioners.
Presentations were longer than what I consider normal, 45-50 minutes. Topics were focused for the most part upon Open Source software which included: Applications, Architecture, Engineering, Methods and Systems used within Data Analytics. As a sponsor it was difficult to get away and listen to the presentations, certainly a few of them looked as though they would have been quite interesting to have attended.
My presentation titled: Data Science in the NOW - It Takes an ARMY of tools! Focused mainly upon the vast array of available Open Source DB's, Indexing Engines, File Systems, Query Engines and Streaming Engines. As well, I spent a little time on the definition of "NOW" (within the context of data analytics), latency and our own human (physiological) limitations with perception. I made it a point to mention most that are available as well as their history, general feature set, strengths and weaknesses. I selected a few out of the myriad for special attention. Examples included: Storm, Cassandra, HBase, xQL's, Hadoop and a few others. The presentation is available for anyone to view on slide-share. Unfortunately without the notes attached it may seem a lot of the detail is missing. If you want to read through the notes that apply to each slide, please just let me know and I will send them to you.
My only gripe is the venue itself. While quaint, it had some real drawbacks. For example, power outlets in front of the vendor tables, rather than behind. A lounging area directly in front of the vendor tables whereby attendees backs were to us. Therefore making it rather difficult to engage in useful discussion. Finally the main presentation room is built upon tiers, much like you have experienced in large collegiate classrooms. However, the drawback was that attendee's each were behind a small barrier that hid their hands. With long presentations I noticed a lot of arm/elbow movement indicating QWERTY abuse or thumb wrestling rather than focus upon the presenter.
We met a few really cool people that we are already following up with. All in all it was a good event, glad we were part of it.