Create Configurable ARuco Markers with Plastic Construction Bricks

UPDATE on May 5, 2023: OpenCV 4.7.X breaks the demonstration; the updated link to the code is this.

I hope we could say the name of a particular brand of construction bricks and not face repercussions from the legal team at that particular famous plastic bricks brand that posts such fair play rules. But unfortunately, the road to hell is undoubtedly paved with good intentions. The link will serve for search engines to pick up the brand name, which we will not mention for fear of breaching those fair play rules, indeed written in a dark tongue.

Yes, we will build fiduciary ARuco markers with plastic bricks and check that they can be detected in real-time while building them. This could be useful to create flexible tagging elements that need to be both configurable and durable, more configurable and durable than a paper-printed ARuco marker; who has a printer, an actual 2D paper printer, available in 2022 anyway?

Remember that the original ARuco marker definition is in this article by researchers at the University of Cordoba in Spain. The article shows this image:

In the image, we can see the various proposed forms for computer-readable codes that are both easy to detect and difficult to "hallucinate" by a computer; that is, it is difficult to generate a false positive, a detection of a marker when there is no marker. ARuco proposes the automatic generation of the marker dictionary and enables the easy generation of detectable and identifiable markers, so easy that we can build them by hand. We will demonstrate the process using a Google Colab Notebook as we can display camera images, with an unavoidable delay, directly in our web browser.

We will need the following modules for the detection and computation tasks:

import base64
import io
import time
import numpy as np
from PIL import Image
import cv2
from string import Template

And these modules are needed to execute javascript code and render HTML properly, HTML is not explicitly used in the code; it is a hidden dependency that will prevent the video stream from showing if it is missing:

import html
from IPython.display import display, Javascript
from google.colab.output import eval_js

The main video capture code is written in javascript, introduced into a Python string template, and executed with IPython.display.Javascript. We are not Javascript experts; we are not Javascript fans; even if Java is an excellent educational tool to learn to program, when in close proximity to Python, it shows all its ugliness. This code is adapted from this external post:

def start_input(video_width=512, video_height=512):
  js_script = Template('''
    var video;
    var div = null;
    var stream;
    var captureCanvas;
    var imgElement;
    var labelElement;
    
    var pendingResolve = null;
    var shutdown = false;
    
    function removeDom() {
       stream.getVideoTracks()[0].stop();
       video.remove();
       div.remove();
       video = null;
       div = null;
       stream = null;
       imgElement = null;
       captureCanvas = null;
       labelElement = null;
    }
    
    function onAnimationFrame() {
      if (!shutdown) {
        window.requestAnimationFrame(onAnimationFrame);
      }
      if (pendingResolve) {
        var result = "";
        if (!shutdown) {
          captureCanvas.getContext('2d').drawImage(video, 0, 0, $video_width, $video_height);
          result = captureCanvas.toDataURL('image/jpeg', 0.8)
        }
        var lp = pendingResolve;
        pendingResolve = null;
        lp(result);
      }
    }
    
    async function createDom() {
      if (div !== null) {
        return stream;
      }
      div = document.createElement('div');
      div.style.border = '2px solid black';
      div.style.padding = '3px';
      div.style.width = '100%';
      div.style.maxWidth = '600px';
      document.body.appendChild(div);
      
      const modelOut = document.createElement('div');
      modelOut.innerHTML = "<span>Status:</span>";
      labelElement = document.createElement('span');
      labelElement.innerText = 'No data';
      labelElement.style.fontWeight = 'bold';
      modelOut.appendChild(labelElement);
      div.appendChild(modelOut);
           
      video = document.createElement('video');
      video.style.display = 'block';
      video.width = div.clientWidth - 6;
      video.setAttribute('playsinline', '');
      video.onclick = () => { shutdown = true; };
      stream = await navigator.mediaDevices.getUserMedia(
          {video: { facingMode: "environment"}});
      div.appendChild(video);
      imgElement = document.createElement('img');
      imgElement.style.position = 'absolute';
      imgElement.style.zIndex = 1;
      imgElement.onclick = () => { shutdown = true; };
      div.appendChild(imgElement);
      
      const instruction = document.createElement('div');
      instruction.innerHTML = 
          '<span style="color: red; font-weight: bold;">' +
          'Click here or on the video window to stop stream.</span>';
      div.appendChild(instruction);
      instruction.onclick = () => { shutdown = true; };
      
      video.srcObject = stream;
      await video.play();
      captureCanvas = document.createElement('canvas');
      captureCanvas.width = $video_width; //video.videoWidth;
      captureCanvas.height = $video_height; //video.videoHeight;
      window.requestAnimationFrame(onAnimationFrame);
      
      return stream;
    }

    async function takePhoto(label, imgData) {
      if (shutdown) {
        removeDom();
        shutdown = false;
        return '';
      }
      var preCreate = Date.now();
      stream = await createDom();
      
      var preShow = Date.now();
      if (label != "") {
        labelElement.innerHTML = label;
      }
            
      if (imgData != "") {
        var videoRect = video.getClientRects()[0];
        imgElement.style.top = videoRect.top + "px";
        imgElement.style.left = videoRect.left + "px";
        imgElement.style.width = videoRect.width + "px";
        imgElement.style.height = videoRect.height + "px";
        imgElement.src = imgData;
      }
      
      var preCapture = Date.now();
      var result = await new Promise(function(resolve, reject) {
        pendingResolve = resolve;
      });
      
      shutdown = false;
      
      return {'create': preShow - preCreate, 
              'show': preCapture - preShow, 
              'capture': Date.now() - preCapture,
              'img': result};
    }
    ''')
    
  js = Javascript(js_script.substitute(video_width=video_width,
                                     video_height=video_height))

  display(js)

We need to handle the javascript webcam image response by taking a "photo" of the current captured frame. Then, with get_drawing_array, we will also add a new channel to the image with an overlay of the returned image from a detection function, in this case, detect_makers:

def take_photo(label, img_data):
  js_function = Template('takePhoto("$label", "$img_data")')
  data = eval_js(js_function.substitute(label=label, img_data=img_data))
  return data

def js_reply_to_image(js_reply):
    jpeg_bytes = base64.b64decode(js_reply['img'].split(',')[1])
    image_PIL = Image.open(io.BytesIO(jpeg_bytes))
    image_array = np.array(image_PIL)
    return image_array
  
def get_drawing_array(image_array, video_width=512, video_height=512):    
    drawing_array = np.zeros([video_width, video_height, 4], dtype=np.uint8)
    drawing_array = detect_markers(image_array, drawing_array)
    drawing_array[:, :, 3] = (drawing_array.max(axis=2) > 0 ).astype(int)*255
    return drawing_array

def drawing_array_to_bytes(drawing_array):
    drawing_PIL = Image.fromarray(drawing_array, 'RGBA')
    iobuf = io.BytesIO()
    drawing_PIL.save(iobuf, format='png')
    var_js = str(base64.b64encode(iobuf.getvalue()), 'utf-8')
    fixed_js = Template('data:image/png;base64, $var_js')
    drawing_bytes = fixed_js.substitute(var_js=var_js)
    return drawing_bytes

The marker detecting and overlaying function is this:

arucoDict_4_4 = cv2.aruco.Dictionary_get(cv2.aruco.DICT_4X4_100)
arucoDict_6_6 = cv2.aruco.Dictionary_get(cv2.aruco.DICT_6X6_100)
aruco_dicts = [arucoDict_4_4, arucoDict_6_6]
arucoParams = cv2.aruco.DetectorParameters_create()
bbox_color = (0, 255, 0)

def detect_markers(image, output_image):  
  start_time = time.time()

  # Loop over aruco dictionaries:
  for aruco_dict in aruco_dicts:
    (corners, ids, rejected) = cv2.aruco.detectMarkers(image,
                                                     aruco_dict,
                                                     parameters=arucoParams)
    
    # verify that there are detections in frame first:
    if len(corners) > 0:
      ids = ids.flatten()
      # loop over the detected ArUCo markers:
      for (marker_corner, marker_id) in zip(corners, ids):
        corner_points = marker_corner.reshape((4, 2))
        (top_left, top_right, bottom_right, bottom_left) = corner_points

        top_right = (int(top_right[0]), int(top_right[1]))
        bottom_right = (int(bottom_right[0]), int(bottom_right[1]))
        bottom_left = (int(bottom_left[0]), int(bottom_left[1]))
        top_left = (int(top_left[0]), int(top_left[1]))

        # draw the bounding box of the ArUCo detection
        cv2.line(output_image, top_left, top_right, bbox_color, 2)
        cv2.line(output_image, top_right, bottom_right, bbox_color, 2)
        cv2.line(output_image, bottom_right, bottom_left, bbox_color, 2)
        cv2.line(output_image, bottom_left, top_left, bbox_color, 2)

        # compute and draw the center (x, y) of the marker
        cX = int((top_left[0] + bottom_right[0]) / 2.0)
        cY = int((top_left[1] + bottom_right[1]) / 2.0)
        cv2.circle(output_image, (cX, cY), 4, (0, 0, 255), -1)

        # Write the ArUco marker ID text:
        cv2.putText(output_image, str(marker_id),
          (top_left[0]-15, top_left[1] - 15), cv2.FONT_HERSHEY_DUPLEX,
          1, (0, 255, 0), 2)
  
  fps = 1/(time.time() - start_time)
  fps_text = f'Detecting Markers: {fps:.2f} FPS'
  cv2.putText(output_image, fps_text, (20, 30), 
              cv2.FONT_HERSHEY_SIMPLEX, 1, bbox_color, 2, cv2.LINE_AA)
  
  # Add our logo if present:
  try:
    logo_file = '/content/ostirion_logo.jpg'
    img = cv2.imread(logo_file)
    new_size = 50
    img = cv2.resize(img, (new_size, new_size), interpolation = cv2.INTER_AREA)
    lim = -new_size-1
    output_image[lim:-1, lim:-1, 0:3] = img
    output_image[lim:-1, lim:-1, 3] = 1
  except:
    pass
    
  return output_image

We load 4x4 and 6x6 ARuco dictionaries for flexibility; the demonstration uses 4x4 markers for simplicity. And our main function to start the detection task is the following:

start_input()
label_html = 'Capturing Webcam Stream.'
img_data = ''

while True:
    js_reply = take_photo(label_html, img_data)    
    if not js_reply:
        break

    image = js_reply_to_image(js_reply)
    drawing_array = get_drawing_array(image) 
    drawing_bytes = drawing_array_to_bytes(drawing_array)
    img_data = drawing_bytes

These will be our plastic bricks to construct the ARuco markers; these are from a very famous brand; any brand will work, in fact, the bricks from this very famous brand are notoriously tricky to remove from a flat plate, so our recommendation is to assemble them on the "wrong" side of the plate:

The result from running the demonstration notebook is this:

ARuco markers with ID numbers 8 and 85 are straightforward to make.

Do not hesitate to contact us if you require quantitative model development, deployment, verification, or validation. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, automation, or intelligence gathering from satellite, drone, or fixed-point imagery.

The notebook to fully replicate this demonstration is here. This demonstration code is old; make sure to check the notice on top of the post.

OSTIRION

Create Configurable ARuco Markers with Plastic Construction Bricks

Recent Posts