Who better to give an expert assessment of where we stand with Artifical Intelligence and where we’re headed (fast) than a lead technologist at a company using machine learning and lots and lots of data to make home life more efficient and affordable? Ghinwa Choueiter is the kind of person who gets a serious thrill out of elegant math, geeks out on efficiency, and can tell the difference between deep and shallow learning architectures.
What are your latest thoughts on AI and Machine learning? And what about Natural Language Processing (NLP)? Do you think it’s important the public knows the difference?
Natural Language Processing (NLP) is a subfield of Machine Learning, which itself is a subfield of AI. Traditionally, the goal of AI was to show that machines could exhibit intelligence, reasoning, and emotion. In the 1950s, AI pioneers certainly proved that machines could be intelligent by learning how to solve mathematical problems and playing games such as checkers and chess better than humans.
Today, with the advent of autonomous vehicles and digital assistants that can recognize our speech and understand our intent, it’s easy to assume that AI domination is right around the corner. There are even individuals such as Elon Musk who fear that AI will take over the human race.
Granted, there are many achievements out there that are not the most life-changing. Take targeted advertising, for example; do you really want to be stalked by that shoe you looked at once? But when done right, targeted advertising helps companies increase sales and revenue. There are also endless technological endeavors that affect our lives positively: language translating and language learning tools help bring people together, autonomous vehicles could be a solution for the elderly to get around, data-driven job search algorithms already help individuals find work faster, etc.
The availability of big data and cheap computing resources coupled with the re-emergence of neural nets (deep learning) has led to computers easily learning patterns and performing certain tasks, such as speech and image recognition, sometimes even better than humans.
I do, however, believe that a lot of the intelligent tasks that computers have perfected are still narrow in their capabilities and it will be a couple of decades before AI reaches its full potential and machines can fully mimic humans (e.g. have a long open-ended conversation with a human or another machine).
I think the general public is aware of the power of Machine Learning because they interact with its products in their everyday lives (e.g. Alexa, Siri).
More people should learn about AI and join the discussion on its socioeconomic impact on society.
We need people to manage robots and analyze the large amount of data generated by the AI processes. As I mentioned earlier, AI is spreading but will not do so overnight. It will take time and we should use that time to discuss AI’s impact on the workplace and the steps needed to transition future generations into more technical vocations.
When you build out your team, what are you looking for in a data scientist?
We look for candidates who have some background in mathematics and statistics, have encountered one or more problems where they’ve used machine learning intelligently, and aren’t afraid of diving into noisy data in a mathematically sound way.
There are lots of data science and machine learning classes available online and these are wonderful opportunities for people to learn about these topics. That said, we encourage aspiring data scientists to apply their skills to real-life noisy data outside of a classroom setting.
In addition to these general requirements, given the nature of the electrical input we use to train our models, we’re interested in applicants who have experience with signal processing and time-series. Designing machine learning algorithms to cluster and model time-series is more challenging than doing so for discrete signals.
While we enjoy learning about achievements such as Twitter-based sentiment analysis from our candidates, we find projects that use deep learning architectures to learn sequence model, for example, are much more relevant to the problem we are trying to solve.
Finally, Sense being a start-up, the data science team does not have a dedicated research team, and so every member of the data science team does the research work for their project, prototypes it, and then writes production code which is peer-reviewed. Any future member of the team needs to be comfortable with this process.
With connected homes on the rise, what’s the role of data in this picture?
There is no substitute for data. Data should be the most valuable asset for any IoT company. Case in point, let me tell you more about how important data is at Sense.
Sense is a home energy monitor that gives its users detailed insights about their energy usage. It helps them identify high-consuming devices (boy does that roof heat cable consume a lot of power!), and it allows them to set appliance-tracking notifications (laundry is done!).
The monitor records current through transformers that snap around the two mains in an electrical panel. Another two cables measure the voltage signal. All four cables— current, two voltage—record signals at a high level of detail, and this data is indispensable.
In order to give our users information about each of their appliances separately—as opposed to overall household electricity—the data science team takes advantage of the fact that different home appliances have different “electrical signatures.” A toaster and a space heater are both heat devices, but they certainly consume power differently.
We look for subtle changes in incoming data and model the signatures of different appliances.
Our appliance detection process, which uses various Machine Learning algorithms, is tailored to the data of each home, and as is the case with most Machine Learning applications, the performance of our algorithms is tied to the amount and quality of data we record. Without data, we could only conjecture the electrical signature of different appliances and attempt to create templates for each.
However, template-based pattern matching has long been dethroned by statistical data-driven approaches which are the solutions we have adopted at Sense.
What’s challenging about the Sense problem when compared to more traditional machine learning tasks, such as speech recognition, is the lack of ground truth, or information that’s gathered through direct observation as opposed to inference. Almost anyone can listen to an audio piece and transcribe it. Very few people can look at an electrical signal and tell you exactly what is going on. This means that the data science has to 1) use unsupervised techniques—ones that do not rely on labeled data—to learn appliance models and 2) come up with innovative ways to collect ground truth or labeled data.
That’s where our users come in. Although we might not be sure what the mystery appliance is, we do provide our users with our top guesses, and we also generate various usage stats for that appliance. This helps them figure out what the appliance is and in turn they help us get labeled data that is fed back in our training algorithms.
Can you remember when you fell in love with data science?
My path into data science came through automatic speech recognition (ASR), which is my first love. I wrote my first digit recognition software in Matlab as an undergraduate at the American University of Beirut, and I remember training it on just 10 instances per digit (so much for big data!)
Watching Hidden Markov Models—those most elegant mathematical models—at work was the best feeling ever and I knew I was hooked.
The project encouraged me to apply to the Spoken Language Systems (SLS) group at MIT. After joining SLS, I continued to dive deeper into ASR and did my Masters and PhD theses on acoustic and lexical modeling respectively.
My first job after graduate school was at Vlingo, a Cambridge-based startup that developed a voice-to-text mobile app. It was a perfect fit with my academic training. I got to play with speech recognition models and data from more than 10 languages.
After Vlingo, I joined DataXu, an ad-serving start-up, and that’s where I first came across the term “Data Science,” which had become a field of its own. With DataXu’s platform handling millions of ad transactions per day, I worked on Big Data problems and technologies. I developed algorithms to automatically run bidding campaigns while optimizing for spend and cost.
Currently, I’m the data science team lead at Sense, and having joined the company very early, I’ve watched the Sense monitor installations grow from a dozen employee houses to thousands of real users across the United States and beyond. Every step I’ve taken in my career has been for a reason, and my reasons have changed as I’ve evolved as a person and a technologist. I joined Vlingo because I wanted to apply my academic training to real-life data. I went to DataXu because I wanted to expand my comfort zone and learn about Big Data technologies.
I joined Sense because I wanted to make a meaningful career step that would have an impact on this planet and future generations.
I was also looking for a challenging problem and a strong team that would also foster my professional growth. I am definitely fortunate to have found all my requirements in my current workplace.
Read more by G. Choueiter at the Sense blog.