We continue the Under the Hood series with a Q&A with Lymbix CTO and Hot Sh!t List member Josh Merchant (@joshmerchant, LinkedIn). (Disclosure: I sit on the Board of Directors for Lymbix and helped them with their application/acceptance to the Microsoft BizSpark One program). Lymbix has raised approximately $3.8MM in funding from GrowthWorks and other angel investors. The Lymbix team is 18 people based in Moncton, NB and continues to grow.
Lymbix Sentiment Intelligence measures the tone and emotional impact of words in everyday written language. As a global leader in sentiment analysis technology, applications powered by Lymbix provide a more definitive look at specific emotions like friendliness, enjoyment, amusement, contentment, sadness, anger, fear, and shame and give insight to the true meaning of what brings positive and negative results. In short, Lymbix delivers incredibly fast sentiment analysis and can identify the real emotion in any domain of text exposing clarity and confidence on an individual message level.
An engine that analyzes emotion in text. Simply put, we’ve built an emotional spell check that we call ToneCheck, which looks into the emotions written in email communications, lifting out how someone may feel – or rather, the “tone”, they’ll perceive when they read the message. This technology is built off our core engine, which is available as an API for partners to understand more user expression style sentiment analysis. As a business, Lymbix is building better business communication tools and reporting for companies to analyze communication in sales, human resources, customer support. Think of it like an insurance package fitting nicely into your risk management profile.
How the Technology works
We use an array of techniques to training our systems to better understand the emotional interactions in common day communication. We analyze streams of data, whether it be from Facebook, Twitter, emails, blogs, or the news, and dissect elements of “emotive context”, meaning a snippet of text that can cause an emotional arousal in an individual. This is our linguistics component of our system. We believe in human powered insight, so we then take a slew of emotive context, and blast it through our own crowd-sourced network called ToneADay.com. We have just shy of 10k raters who give us their opinions of both “real” and “fake” emotive context to gauge the levels of emotion that can occur based on parameters such as frequencies, demographics, 8 primary emotions and so forth. We then build emotional lexicons which give us the power to test any incoming queries to detect emotional relevancy. We then apply our “emotional reaction algorithms” to come up with how different emotions play a part in determining the degrees of emotion in the query. When the system ever detects something that it has never heard of it, it quickly takes action and tries to learn it. In effect, the system gets smarter the more that its used.
We’re hosted on Rackspace, as well as Azure. With Rackspace we have a cloud and private hosted solution giving us the elastic scalability that we need to service this type of NLP on a massive scale. We’re a nice blend of Ruby, Java, and C#. Sounds gross, but for us, the solution fits quite nicely.
For horizontal scaling efforts (our API, and freemium ToneCheck users) we use multiple nodes replicated as our “workers”, sitting on Redhat using served by apache. Sinatra is used to handle the REST calls (essentially the wrapper) harnessing java – linking through sockets to provide really fast linguistic calculations on requests. We persist resident data through redis, and pull sync jobs to migrate up to the master datastore. These ‘nodes’ effectively are spawned up and down as we predict traffic congestion. We take full advantage of Rackspace load balancers to handle distribution of these requests. We monitor this bad boy with CloudKick – probably the best monitoring and performance analytics tool we’ve come across.
For ToneCheck (pro/business), we’re deployed on Azure. Works well for our business customers to give better piece of mind of no data persistence, enterprise integration (on a domain level), and security. Essentially we’ve built a RESTful service on a Web role that wraps the same Java logic as in our cloud. We have worker roles to do some of the heavy lifting, but we try to keep things in the Web Role for high priority, super fast response times.
As our system is ever evolving, in terms of understanding new emotive context, we use our own sync services to deploy lexicons across all our worker nodes (Azure & Rackspace). To build the lexicons, we need massive power, so we use a big hypervisor that performs all our “secret sauce” algorithms from our datastore. We have 3 layers of databases in our system, which seems crazy, but each has a niche. MySQL is basic user data for our apps and all the boring data to keep. Mongo is our dynamic datastore thats used for all our linguistic data and everything we need to build our lexicons, which is sharded for optimization and running our Map Reduce jobs. We also keep a Hadoop datastore for all the new language we’re processing for reporting and running massive queries on for some of our “in the making” linguistic calculations/improvements.
Our development practises are pretty neat. We use continuous integration to achieve higher standards of quality for all our apps. We’re a little old school, still using some SVN repos to manage our data (Beanstalk rocks), but now we’re starting to migrate more to the Git. The team is divided up into sub teams, which are all managed independently, and constantly on two week (global) dev cycles. We do all our project management through Pivotal Tracker, and have wicked fun demo days at the end of every cycle showcasing each teams improvements and brainiac innovations to everyone (while consuming beer and pizza). Our team is very passionate about the problem we’re trying to solve, technology, and code. We’re split about 50/50 Android & iPhone, so that pretty much says it all!
If you’re running a mail client (Outlook or GMail or Lotus Notes) you can try ToneCheck and to minimize the “cost” of dealing with misunderstandings.
Interested in being profiled in our Under the Hood series, we are actively looking for Canadian startups building “interesting” technologies and solving “interesting” problems. Contact me by completing your initial Under the Hood submission.