When the microscope gets a brain: Real-time AI on a $20 chip

Source: Ova Solutions
We built a system that tracks, counts, and classifies microscopic objects at 60 FPS — entirely on-device, with no cloud in sight. Here’s how.
There’s a huge layer of tasks in laboratories around the world that is still done with human eyes. Like, really huge. I’m talking about everything where a microscope is used — bacteria colonies, cell structures, tissues, pathogens, substance distribution. All of this is a visual class of tasks.
A lab technician sits down, looks through a microscope, and makes a call: what they see, how many cells, how they’re behaving.
Chemical analysis was figured out a long time ago. Today, you give a blood sample for an express test, and within minutes, a machine breaks it down into components using optical or chemical methods. Markers bind, sensors measure concentration, electrodes give a signal — all automatic. But those are simple chemical compounds. What do you do when you need not a chemical, but a structural analysis? When you need to look at a complex image and say: Here are bacteria, here are living organisms, here’s this morphology, and there’s another one?
This class of tasks will be fully automated in the near future. And we can show an example of how that’s going to happen. We built a system that tracks microscopic objects in real time. This system:
- Is entirely built on an embedded chip that costs $20, with no cloud, no external GPU, no internet
- Captures 16 milliseconds per frame, over 60 FPS
- Has power consumption of around one watt and runs off USB
Why did we start with sperm?
Because it’s a really good starting point. Sperm are simple objects in motion. You don’t just need to analyze a photo — you need to see movement. Sperm cells are swimming around, and you need to time them, measure distances, and observe — who’s fast, who’s slow, who didn’t go anywhere at all — plus assess morphology, including structure, size, and whether they look the way they should. From all of this, a fertility index is calculated.

And in the vast majority of labs worldwide — like almost 90% — this is still done by a doctor. Sits down, looks through a microscope, and the result depends on their qualifications. Two different technicians and one sample can produce different numbers. You can’t say it’s some massive global problem, but it’s not good. It’s an entire class of men’s health where everything is still done with analog methods, without digital diagnostics that would give a definitive answer.
And then there’s the human factor from the other side. The sample has to be fresh — within an hour of collection. So you show up at a clinic, they hand you a cup, and say here’s a little room for you, good luck. Then you smile at everyone, hand it in, and wait. Not the most pleasant procedure, you know. It’s not like guys are lining up to repeat this every Wednesday.
So the question was: What if the instrument itself counted everything? Load it up, press a button, and get a printout with all the indices. No subjectivity, no dependence on whether the technician got enough sleep or not.
Why not the cloud?
That’s the first thing everyone asks. Record the video, send it to a server, a powerful GPU crunches everything, and sends back the result. And for most tasks — yes, that’s a normal approach. But there’s a nuance here. These instruments sit in clinics. The second you start sending patient data to a server, you need patient identification, then looking up that patient in a database, encryption, HIPAA, GDPR — all of that is a complex architectural solution.
But if the analysis is done on-site? The instrument doesn’t care at all who owns that sample. No data security requirements apply to it because it doesn’t store any data. It produces a report, the hospital takes that report and saves it in their own database, which already has all the compliance built in. Everything stays on-site. That’s a huge plus.
There’s also the fact that connectivity isn’t available everywhere. Transmission latency kills real-time. And cloud inference costs money forever, while a chip is a one-time purchase.
So we set ourselves a goal: all analysis on the chip, no uploading to the cloud. That was the key goal.
Let’s talk hardware
We went with the NXP i.MX 8M Plus, Variscite Symphony board. It has an NPU rated at 2.3 TOPS — 2.3 trillion operations per second. Not a monster, but for analyzing one microscope field of view, it’s enough. The chip costs $15–25 depending on configuration. The recognition algorithm adds about one watt to power consumption.

The model is the YOLOv8 Nano, the lightest version of YOLO, built for edge. Fast, small, made for exactly this kind of hardware.
Training was the interesting part. YOLO needs labeled data — hundreds of frames, with bounding boxes drawn around every object on each one. We had about 40 particles per frame. Labeling that by hand is slow, expensive, and error-prone. We wrote a Python script with OpenCV that does the labeling automatically.
Microscope footage has good contrast — dark objects, light background — so good old computer vision (thresholds, contours) finds everything on its own. We used that to generate the dataset, and then YOLO learned to generalize and catch what simple rules miss. From raw video to a complete dataset in seconds, with zero manual work.
Next was quantization. The NPU only works with INT8, while neural networks are normally in float32. You compress the model — sounds scary, but with proper calibration, accuracy barely drops. Meanwhile, the model is 75% lighter, and speed hits full hardware acceleration.
Result: ~16 ms per 320×320 frame. Over 60 FPS. 40+ objects per frame. Each one tracked across frames, velocity calculated, color-coded in real time — green (fast) or red (slow).
On resolution, speed, and certainty
So, 320×320 is a crop, not the full microscope image. If you need the full field of view, say 1280×1280, that’s 16 tiles of 320×320. At 16 ms per tile, that comes out to roughly 256 ms for the whole image. Here’s a good takeaway, by the way: If we record one second of observing sperm cells — and one second is enough, they’re small and fast, so you can draw all the conclusions — then in 5 seconds, we get the result. That’s on the crop. At full resolution, it’s 20–30 seconds. Load it up, wait, get a report.
Sperm is the starting case, but the approach itself works for any task where someone is currently looking through a microscope and making a decision. Blood cells. Bacteria colonies. Tissues. Water quality, including microplastics and algae. Industrial quality control to assess contamination in lubricants and fluids.
Here’s one example I particularly like: ovulation.

There are ovulation mini-microscopes. You lick the lens, wait, calcium salt crystals form, and create a pattern. If it’s peak ovulation, you see a characteristic lattice. The device is cheap and has been on the market for years. The problem is that people can’t tell if they’re seeing the peak or not. Read the reviews — they literally write: “I think I see something, some kind of lines, or not?” It turns into some kind of guesswork. And this is a serious decision. And right here — with a small camera and a small chip — the system tells you definitively: yes, peak, or no, not peak.
And that’s the whole secret. People prefer certainty. Remember mercury thermometers? They worked perfectly fine, but everyone switched to digital — and not just because of mercury. Because people subconsciously prefer to hand off the responsibility to an instrument. A number on a screen instead of “well, it looks like it’s somewhere around 98, maybe 100.” Demand for definitive analysis systems will generate far more trust than expert models with manual interpretation. That’s just a fact.
On the horizon
What’s next is actually even more interesting than what we’ve already done. The i.MX 8M Plus is a full Linux SoC, great for devices that need an OS and a display. But now the first weak AI chips are appearing — Nordic, STM, and others. Microcontroller class. Small, inexpensive, low power consumption. They won’t handle a video stream, but they can absolutely analyze a single image. You take a photo, feed it into the chip, and get your answer.
Imagine a device the size of a flash drive. Plug it into USB, take a shot, press a button — done. Plus $10–15 in manufacturing cost. The hardware part already exists, and it’s just barely appearing now. What’s missing is trained models, deployment pipelines, and clinical validation. That’s what we’re working on right now.
In general, if you look at the bigger picture, absolutely everything related to laboratory structural analysis, where you need to find a characteristic pattern in an image — bacteria, cells, tissues, fluids — is a market for solutions like this.
So, what we proved: A $20 chip handles real-time object detection at 60+ FPS. Training data can be generated automatically. INT8 quantization gives you minus 75% on model size with minimal accuracy loss. And one pipeline applies to dozens of tasks.
The microscope is getting a brain. And that brain costs less than a plate of spaghetti with a glass of wine.
Lisa Voronkova is the CEO of OVA Solutions, a medical device R&D company that has taken over 200 devices from concept through manufacturing. She is also the author of Hardware Bible: Build a Medical Device from Scratch, available on Amazon.
Informal is a freelance collective for the most talented independent professionals in hardware and hardtech. Whether you’re looking for a single contractor, a full-time employee, or an entire team of professionals to work on everything from product development to go-to-market, informal has the perfect collection of people for the job.