“Sheldon, if you were a robot, and I knew and you didn’t, would you want me to tell you?” – Howard Wolowitz.
As you’d see from many of the other blog posts, I spend more time than I should thinking about home automation. I put a new smart lock in recently, and as often is the case, the product comes with an app for the phone which communicates with a cloud service. Just as often, that cloud service has API that are not documented, nor encouraged to be used by components other than the provided app. It’s such a shame given the work they put into creating the API, that they don’t welcome the automation community in interacting with those API.
The API in question, would randomly respond to a request with PNG data for a CAPTCHA request which would then be displayed in the app. That isn’t fantastic when you’re trying to call the API from an automation component. So this post is about the code I use to process the CAPTCHA image, determine the letters, and then send the response back to the API. There’s a range of machine learning based cloud services that provide image letter extraction, including IBM’s Watson Visual Recognition and Microsoft’s Computer Vision. The code example below uses Microsoft’s Computer Vision service hosted in Azure.
I’ve removed the detail from the error handling, and cancellation tokens for brevity. The code below uses the Microsoft Computer Vision and SkiaSharp packages. I needed SkiaSharp for resizing the image (sometimes the captcha image is below the minimum size for the Azure service). The code from memory is fairly similar to one of Microsoft’s examples. Basically you resize the image, call the API and then wait for the result. The API is designed to extract multiple lines of text from an image, where as we only need the first line here. Sometimes the API will detect a space between the letters, which I’ve stripped out with the Replace method.
private static ComputerVisionClient _visionClient;
public static void Initialise(string strVisionKey, string strVisionEndpoint)
{
_visionClient = new ComputerVisionClient(
new ApiKeyServiceClientCredentials(strVisionKey)) { Endpoint = strVisionEndpoint };
}
public static async Task<string> ProcessCaptcha(byte[] btCaptcha)
{
MemoryStream streamSource = null, streamDestination = null;
SKBitmap sourceBitmap, scaledBitmap;
SKImage scaledImage;
ReadInStreamHeaders headers;
ReadOperationResult results;
IList<ReadResult> textUrlFileResults;
int iHeight, iWidth, iNumberOfCharsInOperationId = 36;
string strOperationLocation, strOperationId, strCaptcha = "";
// Resize the image to a minimum 50x50
try
{
streamSource = new MemoryStream(btCaptcha);
sourceBitmap = SKBitmap.Decode(streamSource);
iHeight = Math.Max(50, sourceBitmap.Height);
iWidth = Math.Max(50, sourceBitmap.Width);
scaledBitmap = sourceBitmap.Resize(
new SKImageInfo(iWidth, iHeight), SKFilterQuality.High);
scaledImage = SKImage.FromBitmap(scaledBitmap);
streamDestination = new MemoryStream(
scaledImage.Encode().ToArray());
}
catch (Exception eException)
{
goto Cleanup;
}
// Call Vision API
try
{
headers = await _visionClient.ReadInStreamAsync(
streamDestination, "en");
strOperationLocation = headers.OperationLocation;
strOperationId =
strOperationLocation.Substring(strOperationLocation.Length
- iNumberOfCharsInOperationId);
do
{
results = await _visionClient.GetReadResultAsync(
Guid.Parse(strOperationId));
}
while ((results.Status == OperationStatusCodes.Running ||
results.Status == OperationStatusCodes.NotStarted));
textUrlFileResults = results.AnalyzeResult.ReadResults;
foreach (ReadResult page in textUrlFileResults)
{
foreach (Line line in page.Lines)
{
strCaptcha = line.Text.Trim().Replace(" ", "");
goto Cleanup;
}
}
}
catch (Exception eException)
{
goto Cleanup;
}
Cleanup:
streamSource?.Dispose();
streamDestination?.Dispose();
return strCaptcha;
}
The whole code segment is called upon receiving a CAPTCHA request. I let it try twice (one attempt per image for two images), and then it it were unsuccessful, it would route the image for manual assessment (at which point you need a way of providing that input back to the API). I’ve not had that situation occur yet however.
~ Mike