Gunther Cox Salvius the Robot

Salvius the Robot

Salvius is an open-source humanoid robot built from recycled parts.

When good chat bots go bad

Allow me to start off by expunging any possible notion that what you will read in the following post has anything to do with Sky Net-like types of robotic systems. There will likely come a day when I will write further on the topic of artificial intelligence. Today however, it seems that the subject at hand is more aptly described as artificial ineptness.

Last week I started writing code for an open-source chat-bot program. My goal was to create something that was basically an open-source version of Cleverbot. There isn't a lot out there to choose from if someone were looking for a program like this so I thought it would be a cool idea to create.

I wrote a program that could respond to user input by finding the closest match based on past conversations. This worked well when a match existed, however when presented with a completely new conversation the program failed to reply with coherent responses. The simplest solution seemed to be to provide the program with more "conversation experience" so that it had a greater selection to choose a response from. In order to improve the program's database of conversations, I hooked it up to CleverBot's api and set it loose chatting away with CleverBot.

The communication between my program and CleverBot was really interesting to observe. The conversation started out awful and was totally incoherent. However, after about ten iterations of the two talking back and forward, my program started replying with some of the new replies it had "learned" from talking with CleverBot. This progressed to the point where they could have an almost completely coherent conversation with each other. The only problem is that my program was still only able to provide replies from the selection of statements that CleverBot had said to it.

I decided to extend the range of where my program could attain conversation data from. This is where I made the mistake. I decided that Twitter would be the optimal source to have the program pull data from. Tweets are short an sentence-like and there is plenty of people on Twitter constantly creating new messages and conversations that are all publicly available. Naturally this seemed like the perfect source of conversation data.

salvius: "hey mc, doing well. Thanks for sharing nmirz new video."
cleverbot: "Are you a dog?"
salvius: "lmao are you a dog person?"
cleverbot: "Who was the who?"
salvius: "Who? Who is but a form following the function of what and what I am is a man in a mask."
cleverbot: "It's not your powers of observation I doubt, but merely the paradoxical nature of asking a masked man who is. But tell me, do you like music?"
salvius: "I like seeing movies."
cleverbot: "I wish I was The Mad Hatter."
salvius: "You're entirely bonkers. But I'll tell you a secret. All the best people are."

The conversations that resulted from pulling conversation data from Twitter's api was not what I had expected. On the positive side, my chat bot was now amazingly up to date with popular culture, even being converse about recent television events that had just premiered (such as the latest Dr. Who episode). Talking with Cleverbot, the two stopped speaking English at one point, preferring to address each other in French. While the responses that the program returned were relevant and coherent, they also were extremely prone to reflect what is probably best described as the chatter of the internet's most profound trolls. A plethora of profanity ensued regularly whenever the subject of a conversation began to have anything to do with sports, various actors, or politics.

For anyone who is interested, I have published the code for my chat bot program in a repository on GitHub, https://github.com/gunthercox/ChatBot.

Out for a drive...


Robot status page

Check out the robot's latest metrics from Salvius on the status page.

After creating the status page, I wanted to write a little bit about the technologies that went into creating it. The status page has been on my to-do list for a while now. I had originally considered having it be hosted on the robot's server, however the issue with doing that is if the server crashes, there is no way to access the status page. By having the status page be a simple static webpage which pulls from a variety of data sources, it makes it nearly impervious to complete failure. Even if the GitHub pages on which it is currently hosted were to become unavailable, the status page could be opened locally in a browser.

Many websites that provide online services provide useful status pages to display information about the sites performance and any potential service outages or problems. A few examples of status pages provided by different online services are from sites such as GitHub (status.github.com), Disqus (status.disqus.com) and Travis CI (status.travis-ci.com).

Salvius has a number of hardware items that report their status to various online sources. Out of an interest to see all of these metrics in one convenient location I created my own status page http://gunthercox.github.io/salvius.status/ which graphs data published by the robot.

The page is 100% responsive thanks to Bootstrap and Chart.js. Chart.js is a HTML5 canvas graphing library which has awesome support for different kinds of graphs. Although I only use the line graph, Chart.js supports a total of six different graph types to choose from.

The robot's status page pulls from a variety of data sources including Travis.CI, Sparkfun's data service, and Twitter. Travis.CI provides code testing services which are run each time a change is made to the robot's source code.
I was very interested to try out Sparkfun's data service which uses an application called Phant to host streams of various data submitted by a variety of networked electronic devices. Sparkfun is currently hosting data streams from sources such as homemade weather stations, GPS logging robots and more. The data service is meant to expose the reality of the Internet of Things. The Internet of Things is a term that came into recent popularity which describes the fact that there is more electronic devices connected to the internet then there is humans on the planet Earth. This has been true since 2008 when the number of devices connected to the internet reached 12 billion with the world population only at 6.7 billion. There is no end in sight to the ever increasing usefulness of interconnected technologies.

I'm planning to add graphs and metrics to this page as Salvius evolves and more sources of data become available. Idea submissions for graphs are welcome. Issue tracker: https://github.com/gunthercox/salvius.status/issues.

New arm mockup

In addition to a pair of new hands for the robot, I am also experimenting with a new design for the arm. At the moment I have created a mock-up in cardboard so that I can make changes easily. The new arm adds shoulder and wrist joints that were not previously included in the design.

Many thanks to +Western New England University for donating the pens which will be used to make two new hands for the robot. For anyone interested in how the pens will be used checkout my original post here. The driving force behind the design of this robot is to design a way for humanoid robots to be constructed with minimal costs by taking advantage of materials that are radially available, and also recycling to help reduce costs.

Happy Earth day everyone and thank you for your continued support!

Redesigned rotary movement for shoulder

I've moved the motor that rotates the robot's arm inside of the chest cavity to save space. The previous arm configuration was based off of one of my first tests for mounting the robot's arms which used a motor mounted in the arm. Mounting the motor inside of the robot's chest cavity will allow me to reduce the weight of the robot's arms for future versions.

I'm currently modeling a new left arm for the robot which will incorporate more joints for an improved range of motion.

Arc reactor installation complete

The photo above shows the robot's arc reactor which, as of April 1st, 2014 has been fully installed. This upgrade will provide enough electrical output to fuel the robot's tremendous power requirements. Currently I'm using a small sample of palladium to catalyst the fission reaction in the arc reactor, however it seems that the reactor is accumulating excessive quantities of neutrinos which have been building up on the surface of the laser emitter array. Once I determine an efficient way to keep neutrinos out of the arc reaction I have estimated that the final output should yield exactly 1.21 gigawatts of clean usable energy.

JPG vs BMP vs PNG vs GIF

This was a quick experiment I did to check the efficiency of a few image formats. For this experiment I base64 encoded a 1 pixel image file of each of the following image file formats: .jpg, .bmp, .png, .gif.

Here is the base64 encoded results of each image:

.jpg - From a 539 byte file
/9j/4AAQSkZJRgABAQEASABIAAD//gATQ3JlYXRlZCB3aXRoIEdJTVD/2wBDAAEBAQEBAQEBA
QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQE
BAQEBAQEBAQH/2wBDAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBA
QEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQEBAQH/wgARCAABAAEDAREAAhEB
AxEB/8QAFAABAAAAAAAAAAAAAAAAAAAACf/EABQBAQAAAAAAAAAAAAAAAAAAAAD/2gA
MAwEAAhADEAAAAX8P/8QAFBABAAAAAAAAAAAAAAAAAAAAAP/aAAgBAQABBQJ//8QAFB
EBAAAAAAAAAAAAAAAAAAAAAP/aAAgBAwEBPwF//8QAFBEBAAAAAAAAAAAAAAAAAAAAA
P/aAAgBAgEBPwF//8QAFBABAAAAAAAAAAAAAAAAAAAAAP/aAAgBAQAGPwJ//8QAFBABAA
AAAAAAAAAAAAAAAAAAAP/aAAgBAQABPyF//9oADAMBAAIAAwAAABAf/8QAFBEBAAAAAAA
AAAAAAAAAAAAAAP/aAAgBAwEBPxB//8QAFBEBAAAAAAAAAAAAAAAAAAAAAP/aAAgBAgE
BPxB//8QAFBABAAAAAAAAAAAAAAAAAAAAAP/aAAgBAQABPxB//9k=

.bmp - From a 126 byte file
Qk1+AAAAAAAAAHoAAABsAAAAAQAAAAEAAAABABgAAAAAAAQAAAATCwAAEwsAAAAAA
AAAAAAAQkdScwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAIAAAAAAAAAAAAAAAAAAAD///8A

.png - From a 69 byte file
iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAIAAACQd1PeAAAADElEQVQI12P4//8/AAX+Av
7czFnnAAAAAElFTkSuQmCC

.gif - From a 35 byte file
R0lGODdhAQABAIAAAP///////ywAAAAAAQABAAACAkQBADs=

Based on these results it is clear that a 1 pixel gif image is the most efficient format for base64 encoding data.

Update:
I conducted a further study of the efficiency of these image formats by generating a series of images of each type ranging from 1 pixel to 2000 pixels and skipping by increments of 200. To make the testing go quicker, I created a script in Python that generates a set of images and logs their file size and dimensions. My script also generates a graph based on these logs to help visualize the data.
If your interested seeing the code or generating the logs for yourself you can download the script from https://gist.github.com/gunthercox/9874531.

For this graph, the horizontal axis is the physical dimensions of the file, and the vertical axis is the file size.
.jpeg = red
.bmp = black
.png = green
.gif = blue

The data from the logs shows that the png to be on average the most efficient format. Jpegs and gifs have very close performance for smaller images but jpegs quickly increase in size for larger images. Bitmap images (.bmp) were surprisingly inefficient and while all these formats had very close sizes for a 1 pixel image file, the bmp image grew tremendously larger as the size of the file was increased.