Computer Vision.Algorithms, OpenCV, code, history.
|
On this page:
OpenCV quick start for version 4.1.1, Visual Studio 2019, C++, and Windows 10
OpenCV is an open source computer vision and machine learning software library. It is an awesome undertaking and everything seems to work well. It is well documented, but you do need to know something about computer vision. Most of the documentation explains “what the functions do and their settings” but little of it explains “the how to use it and the why use it”. Unfortunately, most of the available books and tutorials that do explain how and why have not kept up with its constant update cycle. It was originally targeted to C, C++, and Python. The C interface has been dropped in more recent versions. There are also functions that target Cuda and GPU's.
OpenCV is actively being developed and updated, it seems on an every 3 to 6 months or so schedule. Depending on how you look at it, this could be either good or bad. Good because bugs get fixed quickly and you get lots of new things to use. Bad because it means that lots of documentation becomes obsolete very quickly. The most recent version is 4.1.2 which became available on October 12, 2019. If you are looking for help, a quick Google search for “opencv c++ example” will find code examples referring to many different versions, some of which behave differently than the current version. In my Google search listing, the very first hit at the top of the first page is based on version 2.4.13.7. The original version 2.0 was released in 2008, but there have been numerous more recent updates to 2.x that have been made in parallel with the newer versions 3.x and 4.x. Strangely, this version 2.4.13.7 example is from the OpenCV.org site, but if you go to look at the Releases page, version 2.4.13.7 isn’t even there.
Maybe it’s just me, but I found it a big pain to figure out how to get it installed and up and running with Visual Studio and C++. There is a lot of overhead to get set up. As I mentioned, web help can become obsolete pretty quickly, but those obsolete pages stick around for a long time. I also found that lots of this “help” was missing critical pieces of information or made certain assumptions that you wouldn't know if you were new to it. With all of the available OpenCV versions and all of the available Visual Studio versions, the likelihood of getting everything to match your particular set up is slim. See this for example.
You will find many tutorials and explanations of how to build OpenCV using cmake. I found that using the pre-built opencv_worldxxx.dll’s are the easiest and fastest way to get started and works fine. This quick start was written before 4.1.2 was released. It assumes version 4.1.1 and Visual Studio 2019 (the VS 2017 settings are virtually identical). So with the caveat that this is already slightly out of date, here is a "quick" start guide that should help get you up and coding in short order. I haven't tried it yet, but for version 4.1.2, you could probably substitute "412" for "411" in the steps below.
Step 1) Get all the packages downloaded and installed. So using:
A fast computer with lots of memory, disk space, and a known functional camera
Windows 10 64 bit
Visual Studio 2019 – Community
OpenCV version 4.1.1, installed in the default folders on your C:\ drive
Step 2) In Windows, edit your system’s Environment Variables:
type to search for “Environment Variables”
Environment Variables…
In the System variables pane
click Path > Edit > New…
enter C:\opencv\build\x64\vc15\bin
click OK
click New…
enter the Variable name: OPENCV_DIR
enter the Variable value: C:\opencv\build\x64\vc15
click OK, OK, OK
(It’s probably a good idea to Restart your computer now.)
Step 3) Configure Visual Studio 2019 – Community and set up a project:
(Note that the following sub-steps will make more sense when you are actually in VS.)
Start a new project targeted for Windows Console applications with Visual C++ (Console App)
On the splash screen click Create New Project, then Console App >> Next
browse and create the project in the folder of your choice.
To get started, name the project video_io
Finally click Create and the default Hello World code opens up in the editor
click Project >> video_io Properties...
Make sure Platform is set to X64
select Debug Mode
select C/C++
General
Additional Include Directories
enter C:\opencv\build\include
Precompiled Headers
select Not Using Precompiled Headers
select Linker
General
Additional Library Directories
enter C:\opencv\build\x64\vc15\lib
Input
Additional Dependencies > Edit…
enter opencv_world411d.lib
select Release Mode
select C/C++
General
Additional Include Directories
enter C:\opencv\build\include
Precompiled Headers
select Not Using Precompiled Headers
select Linker
General
Additional Library Directories
enter C:\opencv\build\x64\vc15\lib
Input
Additional Dependencies > Edit…
enter opencv_world411.lib
In Build > Configuration Manager, make sure that Active Platform is set to x64.
Step 4) Add code:
Copy the following code into the video_io.cpp to replace the Hello World code that Visual Studio created automatically. The following simple code example will run your camera and show the live streaming color image and a live black and white copy. Delete all the Hello World code then copy and paste in the following code.
Save the cpp file and project. Make sure VS isn’t complaining with red underlines about any of the syntax in the video_io.cpp file.
OpenCV is an open source computer vision and machine learning software library. It is an awesome undertaking and everything seems to work well. It is well documented, but you do need to know something about computer vision. Most of the documentation explains “what the functions do and their settings” but little of it explains “the how to use it and the why use it”. Unfortunately, most of the available books and tutorials that do explain how and why have not kept up with its constant update cycle. It was originally targeted to C, C++, and Python. The C interface has been dropped in more recent versions. There are also functions that target Cuda and GPU's.
OpenCV is actively being developed and updated, it seems on an every 3 to 6 months or so schedule. Depending on how you look at it, this could be either good or bad. Good because bugs get fixed quickly and you get lots of new things to use. Bad because it means that lots of documentation becomes obsolete very quickly. The most recent version is 4.1.2 which became available on October 12, 2019. If you are looking for help, a quick Google search for “opencv c++ example” will find code examples referring to many different versions, some of which behave differently than the current version. In my Google search listing, the very first hit at the top of the first page is based on version 2.4.13.7. The original version 2.0 was released in 2008, but there have been numerous more recent updates to 2.x that have been made in parallel with the newer versions 3.x and 4.x. Strangely, this version 2.4.13.7 example is from the OpenCV.org site, but if you go to look at the Releases page, version 2.4.13.7 isn’t even there.
Maybe it’s just me, but I found it a big pain to figure out how to get it installed and up and running with Visual Studio and C++. There is a lot of overhead to get set up. As I mentioned, web help can become obsolete pretty quickly, but those obsolete pages stick around for a long time. I also found that lots of this “help” was missing critical pieces of information or made certain assumptions that you wouldn't know if you were new to it. With all of the available OpenCV versions and all of the available Visual Studio versions, the likelihood of getting everything to match your particular set up is slim. See this for example.
You will find many tutorials and explanations of how to build OpenCV using cmake. I found that using the pre-built opencv_worldxxx.dll’s are the easiest and fastest way to get started and works fine. This quick start was written before 4.1.2 was released. It assumes version 4.1.1 and Visual Studio 2019 (the VS 2017 settings are virtually identical). So with the caveat that this is already slightly out of date, here is a "quick" start guide that should help get you up and coding in short order. I haven't tried it yet, but for version 4.1.2, you could probably substitute "412" for "411" in the steps below.
Step 1) Get all the packages downloaded and installed. So using:
A fast computer with lots of memory, disk space, and a known functional camera
Windows 10 64 bit
Visual Studio 2019 – Community
OpenCV version 4.1.1, installed in the default folders on your C:\ drive
Step 2) In Windows, edit your system’s Environment Variables:
type to search for “Environment Variables”
Environment Variables…
In the System variables pane
click Path > Edit > New…
enter C:\opencv\build\x64\vc15\bin
click OK
click New…
enter the Variable name: OPENCV_DIR
enter the Variable value: C:\opencv\build\x64\vc15
click OK, OK, OK
(It’s probably a good idea to Restart your computer now.)
Step 3) Configure Visual Studio 2019 – Community and set up a project:
(Note that the following sub-steps will make more sense when you are actually in VS.)
Start a new project targeted for Windows Console applications with Visual C++ (Console App)
On the splash screen click Create New Project, then Console App >> Next
browse and create the project in the folder of your choice.
To get started, name the project video_io
Finally click Create and the default Hello World code opens up in the editor
click Project >> video_io Properties...
Make sure Platform is set to X64
select Debug Mode
select C/C++
General
Additional Include Directories
enter C:\opencv\build\include
Precompiled Headers
select Not Using Precompiled Headers
select Linker
General
Additional Library Directories
enter C:\opencv\build\x64\vc15\lib
Input
Additional Dependencies > Edit…
enter opencv_world411d.lib
select Release Mode
select C/C++
General
Additional Include Directories
enter C:\opencv\build\include
Precompiled Headers
select Not Using Precompiled Headers
select Linker
General
Additional Library Directories
enter C:\opencv\build\x64\vc15\lib
Input
Additional Dependencies > Edit…
enter opencv_world411.lib
In Build > Configuration Manager, make sure that Active Platform is set to x64.
Step 4) Add code:
Copy the following code into the video_io.cpp to replace the Hello World code that Visual Studio created automatically. The following simple code example will run your camera and show the live streaming color image and a live black and white copy. Delete all the Hello World code then copy and paste in the following code.
Save the cpp file and project. Make sure VS isn’t complaining with red underlines about any of the syntax in the video_io.cpp file.
Step 5) Debug:
Make sure the Solution Configuration is set to “Debug” and the Solution Platform is set to “x64” (in the ribbon on the second line of VS).
Click Debug > Start Debugging (or use the F5 key).
After a few seconds (but it might take a good deal longer to load all the DLLs the first time you try to run the project in Debug mode.
Your first OpenCV program should now be running. You will see three windows open up, re-arrange them as you like. One is a text mode command window, the second is the original color image from your camera streaming live video, and the third is a version of the original image but in black and white. Hit any key to quit. Here’s how I have the windows arranged on my screen.
Make sure the Solution Configuration is set to “Debug” and the Solution Platform is set to “x64” (in the ribbon on the second line of VS).
Click Debug > Start Debugging (or use the F5 key).
After a few seconds (but it might take a good deal longer to load all the DLLs the first time you try to run the project in Debug mode.
Your first OpenCV program should now be running. You will see three windows open up, re-arrange them as you like. One is a text mode command window, the second is the original color image from your camera streaming live video, and the third is a version of the original image but in black and white. Hit any key to quit. Here’s how I have the windows arranged on my screen.
Step 6) Compile:
If the Debug version works, change the VS Solution Configuration to “Release” and “x64”.
Click Build > Build Solution
In Windows Explorer, navigate to the folder
[your default folder] > video_io > video_io > x64 > Release
Double click on video_io.exe
The program should run. Depending on how fast your computer is, you may notice that it runs faster and smoother than the Debug version. Once you start adding lots of additional processing, the compiled version will definitely run faster than the debug version.
If you’d like, make a short-cut to the Desktop so you can run it from there.
Step 7) Use this example to build on and have fun with your own code! Check back here for more examples. I’ll post more.
Note: I think that someone at OpenCV should be able to make an install script that does all of the above with a click of a button. Hint, hint...
2-November-2019
If the Debug version works, change the VS Solution Configuration to “Release” and “x64”.
Click Build > Build Solution
In Windows Explorer, navigate to the folder
[your default folder] > video_io > video_io > x64 > Release
Double click on video_io.exe
The program should run. Depending on how fast your computer is, you may notice that it runs faster and smoother than the Debug version. Once you start adding lots of additional processing, the compiled version will definitely run faster than the debug version.
If you’d like, make a short-cut to the Desktop so you can run it from there.
Step 7) Use this example to build on and have fun with your own code! Check back here for more examples. I’ll post more.
Note: I think that someone at OpenCV should be able to make an install script that does all of the above with a click of a button. Hint, hint...
2-November-2019
PIPE video, old school computer vision, circa 1989
Aspex Incorporated developed the PIPE system (Pipelined Image Processing Engine) in collaboration with the National Institute of Standards and Technology (NIST) in the mid-1980’s. This system was well ahead of its time but also a part of the times. The hardware was developed by Randy Luck, BJ Henrici, and Jim Herriman in conjunction with software by Jim Knapp, Shoshi Biro, and several others, all of Aspex in consultation with Ernie Kent, Mike Schneier, Tom Wheatley and others at NIST, and several other university researchers. An early version was described in Kent’s US patent [1] and in Kent, Shneier, and Lumia [2]. The actual PIPE implementation is better described in Luck [3] and [4].
PIPE was designed to process video images at real-time video frame rates. It consisted of modular processing stages (MPS) that could be flexibly connected under program control in many series or parallel combinations. Each MPS was implemented on two large 15” x 13” circuit boards. Each board was populated with about 200 ICs – 74ALS, 74F, and 74AHCT logic, GALs (small FPGA precursors), dynamic RAM, static RAM, and PROMs.
Aspex Incorporated developed the PIPE system (Pipelined Image Processing Engine) in collaboration with the National Institute of Standards and Technology (NIST) in the mid-1980’s. This system was well ahead of its time but also a part of the times. The hardware was developed by Randy Luck, BJ Henrici, and Jim Herriman in conjunction with software by Jim Knapp, Shoshi Biro, and several others, all of Aspex in consultation with Ernie Kent, Mike Schneier, Tom Wheatley and others at NIST, and several other university researchers. An early version was described in Kent’s US patent [1] and in Kent, Shneier, and Lumia [2]. The actual PIPE implementation is better described in Luck [3] and [4].
PIPE was designed to process video images at real-time video frame rates. It consisted of modular processing stages (MPS) that could be flexibly connected under program control in many series or parallel combinations. Each MPS was implemented on two large 15” x 13” circuit boards. Each board was populated with about 200 ICs – 74ALS, 74F, and 74AHCT logic, GALs (small FPGA precursors), dynamic RAM, static RAM, and PROMs.
The architecture was optimized for real-time point, spatial, and temporal image processing. The design made heavy use of SRAM look up tables and ALUs to perform arbitrary pixel point processing such as scaling, summing, thresholding or non-linear operations. Each MPS had two frame buffers and could perform two real-time 3 x 3 arithmetic or Boolean convolutions. The MPS had what was called a TVF, a two valued function look up table. With two 8 bit data inputs, a 64K x 8 SRAM could perform any operation on two values. For example, with one convolution performing the X gradient, and the other the Y gradient, the TVF could then be pre-programmed with a table of the square root of the sum of the squares of the X and Y gradients and therefore perform things like the Sobel operator in real-time. The frame buffers gave the PIPE temporal processing capabilities. Images were written into the frame buffers while the previous image could be simultaneously read out. The timing system allowed all frame buffers to run synchronously with each other. An entire PIPE consisted of up to 8 such MPS board sets. Each MPS had local image flow connections forward from the previous MPS, recursively from itself, and backwards from the next MPS in the pipeline. This made the system perfect for experimenting with various frame to frame and optic flow algorithms. In addition to the MPS, the PIPE had a video A/D front end and D/A back end for B&W or RGB cameras and other video sources and video outputs to monitors, a set of input frame buffers, a set of output frame buffers, a control stage that orchestrated all the control and handled host computer interface, and finally a processor called ISMAP which performed several kinds of histogram and cumulative histogram functions at frame rate. The system also had 6 video buses that allowed images to be broadcast from anywhere to anywhere in the system beyond the local connections. Other features included region of interest processing and host controlled literal bytes. An AT/386 PC running MS-DOS functioned as a host control and programming computer. PIPE could also connect to a VME or Multibus interface for higher level image understanding applications on high speed computers.
PIPE demo tape
In 1989 I created a PIPE demo video tape for use as a marketing tool. It included implementations of algorithms that I developed along with example video sequences that some of our customers had developed. Click the YouTube link below to watch the 1989 PIPE demo video. Unfortunately the original 3/4" U-Matic master tape has disappeared in the mists of time and only a VHS copy of the demo remains. So the video quality that you see on the YouTube video is not great. Sorry.
Description of the demo sequences on the video
1) A sorting application using pattern matching
1) A sorting application using pattern matching
- This is a translation, rotation and scale invariant matching algorithm. It first computes the Sobel edge direction, thresholded by the magnitude. Then the histogram of that is found. With a black background, the object becomes dominant in the histogram. This histogram is simply a list containing a count of the number of pixels in each of 180 edge directions (2 degree increments were used). Notice that the concept of image structure is eliminated and the histogram becomes essentially a unique signature for the object. Since the structure is gone, it becomes translation invariant. As the object is rotated, the histogram pattern shape remains the same but it just rotates around the 180 possible directions. Object matching then just becomes a rote comparison of the known signature histogram with the current histogram at all 180 rotations. A few other optimizations needed to be applied. In an image with square pixels, the pixels are 1.414 longer in the diagonal direction which causes an error, so this correction must be applied to the histogram pattern. If the measured pattern is normalized, for example to the bin with the maximum count, then over a reasonable range of sizes, the matching becomes scale invariant as well. In the video, the PIPE is performing the image processing, the ISMAP is performing the real-time histogram, and a program in the attached PC is doing the histogram normalizing and pattern matching. Two signature patterns were tested, one for the front and one for the back of the VHS box. As I recall the image processing worked at about 10 to 15 Hz, but clearly from the video, it looks like the computer matching is slower, about 1 Hz. I gave a paper on this [5] at an Electronic Imaging East conference. Below is a link to a PDF of this paper. We also demonstrated this algorithm using an attached neural network co-processor to perform the pattern matching at an SPIE conference in the late 1980’s. This algorithm is somewhat like a global version of the HOG algorithm and was independently invented at around the same time.
translation_rotation_scale_invariant.pdf | |
File Size: | 7943 kb |
File Type: |
2) Object tracking applications
3) Template matching for quality control
4) Various image re-mapping functions
5) Thinning using Boolean morphology
6) Blob analysis using connected components
7) Dynamic image centering
8) Model based vision
9) Motion flow computations for velocity and direction
10) Edge detectors
11) Hough transform for lines
12) PIPE's menu driven, graphical software interface is both easy to use and versatile
- The first example is performing a frame difference followed by thresholding and Boolean dilation. Histograms from ISMAP are finding the center of mass of the lit up pixels and then the host is drawing a cross hair cursor on that spot.
- The Boston University sequences were courtesy Allen Waxman and are described in [6] and [7].
3) Template matching for quality control
- This is showing a Sobel magnitude image, followed by histogramming. The differences between the current live histogram and a stored known good histogram are shown as a number.
4) Various image re-mapping functions
- The first part is performing a log-polar transformation similar to Weiman and Chaiken [8] using non-linear functions for the read addresses of the image in the TVF.
- The second part shows terrain mapping sequences, courtesy Eamon Barrett at Lockheed. Essentially this is Google Earth, 1988 style.
- The driving demo shows variable resolution, depending on where the cursor is located. The ideas is that if the user’s eye gaze point could be sent at a low data rate back to the remote vehicle, then a variable resolution low bandwidth image could be returned to the user. The image is high resolution where the user is looking and progressively lower resolution in the periphery. This variable resolution fovea idea was conceived around the same time that the data compression techniques in JPEG and MPEG were invented. In the video, the periphery pixels look like they could use some filtering.
5) Thinning using Boolean morphology
- PIPE’s Boolean neighborhood is performing connectivity preserving thinning on binary images.
6) Blob analysis using connected components
- The PIPE is just capturing and thresholding the image, the attached PC is running a connected components program finding areas and bounding boxes. We had plans to develop a real-time connected components processor (CONCOMP) board for the PIPE, but never finished it.
7) Dynamic image centering
- An early example of image shake reduction. The PIPE and the attached PC are finding the centroid of the Space Shuttle in the image, then adjusting X and Y address offsets to keep the centroid in the center of the frame. Every video camcorder made now has a similar feature, some use tracking, some use accelerometers.
8) Model based vision
- The first part shows the PIPE running a corner detector using Gaussian curvature. Shrinking reduces these curvature maxima to dots. The ISMAP histograms make finding the X, Y locations of the dots easy. The attached PC then performs the model match as shown on the PC’s screen.
- The second part was courtesy of Chuck Dyer at the University of Wisconsin and is described in [9].
9) Motion flow computations for velocity and direction
- The first part of this demo is performing the simple Horn and Schunk optical flow computation [10]:
- Flow = -ΔI/Δt
- √((ΔI/Δx)^2 + (ΔI/Δy)^2)
- Essentially this is just the frame difference (delta time) image divided by the Sobel magnitude (spatial gradient magnitude). As the frames flow through the PIPE, temporally, the delta time image is computed two frames apart using the first and third, and the gradient magnitude is computed from the middle, second frame.
- The second part of this demo is based on van Santen and Sperling [11] and Adelson and Bergen [12]. It is computing “Reichardt” type quadrature detectors in (x, t) and (y, t), thresholding on strength, taking the atan2 of the (x, t) and (y, t), and finally color coding for direction. The PIPE implementations of these image flow algorithms are described in [13].
10) Edge detectors
- The first part is showing the Sobel direction, thresholded by the magnitude, and then color coded for edge direction. One PIPE stage could perform all this at 60 Hz.
- The second part is showing thresholded zero crossings from the difference of Gaussians.
11) Hough transform for lines
- The Hough transform based on histograms of Sobel direction images. You can see the input binary image in the lower left and the Hough space along the top.
12) PIPE's menu driven, graphical software interface is both easy to use and versatile
- PIPE had its own micro-coded programming system. Run-time programs could be downloaded, and it would run by itself. We devised a software tool called ASPIPE [14] that was written in C, ran on MS-DOS, and used the PC’s EGA text mode graphics characters. It was essentially a text mode pop up windowing system, developed before Windows existed. I based some of its visual design and organization on an article on the Smalltalk environment that I had read about in Byte magazine. The entire ASPIPE program was less than 640K bytes, though as I recall it did use overlays. The software used graphic representations of the signal flow in the PIPE hardware. It presented the entire system with the physical orientation of the processors along the horizontal axis and frame time on the vertical axis. To program it, you set up the image data flow through space and time. You could click on objects and menus would pop up to let you make selections on how that hardware object would behave at that moment in time. Click on one of the processors on this chart and a diagram of the MPS would pop up. Then click on, for example, a neighborhood operator and set up the mode and mask in a pop up. A second tool called LUTGEN let you enter equations to make the various look up tables. The LUT functions could be graphed out, created, saved as small files, and then could be re-used in any program. PIPE’s main control was via an attached PC. Due to limits in the state of the art at the time, the PC/AT bus interface was not fast. PIPE had two computer ports and some customers opted to connect the second port to a VME or Multibus interface to link higher speed computers like the Sun, Sequent, Apollo, Masscomp, and others for faster high level vision processing.
2019 retrospective, 30+ years on
The PIPE’s pixel, image and frame synchronous nature was both a significant benefit, but also in hindsight a limitation for some applications. Image size was fixed at 256 x 240, 8 bit precision, and 60 frames per second. This meant that the user did not need to deal with setting the frames up which was good, but limiting. After customer requests, extensions were added that allowed 512 x 480 image resolution at 15 Hz and 16 bit precision but this capability was not easy to use. For some kinds of algorithms like morphology, one might have wanted to cascade several erosions or dilations in sequence without incurring frame delays, but the architecture didn’t allow that. Everything that flowed into a MPS needed to pass through a frame delay. Pipelining up to 8 stages allowed the 60 Hz frame rate to remain at full speed, but there could be latency delays of up to 8 frames. If an algorithm needed more than 8 stages of processing, you could then add more time by slowing the processing down to 30, 20, 15, 10, 7.5, etc. Hz. Other contemporary vision processor architectures could avoid this issue. For example ERIM’s Cytocomputer [15] was intended to perform cascades of morphology operations within a frame time and it was really good for those kinds of inspection type algorithms, but it didn’t have the temporal optic flow capabilities that the PIPE had. Cal Tech’s PIFEX architecture [16] was in some ways more flexible because it used a more general cross bar architecture to connect its various processing hardware elements. I don’t think it was ever commercialized though. By the late 1980’s, we started to consider a next generation PIPE 2 architecture that would have combined the best features of the PIPE with cross bar connections similar to PIFEX. The PIPE was also contemporaneous with and in many respects, faster for many image processing applications than the more general purpose WARP systolic array processor from Carnegie Mellon University. An 8 stage PIPE system could run at up to 1.2 GOPS which was pretty good for the mid/late 1980's.
Aspex Incorporated ultimately sold 42 PIPE systems, a few with 1 MPS, but most either with 3 or 8. These went to university, government, and corporate research laboratories in the US and Asia. Some 10 systems were sold to Neuromedical Systems Inc. which used PIPE as the image processing front end for the first generation of their neural network based automated pap smear screening system, PAPNET described in their patents and this article [17].
One of the reasons the PIPE 2 never got developed was because ultimately something else did in the PIPE and all the other contemporaneous dedicated processors. By the early 1990’s, Intel 486 and Pentium processors and the new PCI bus got fast enough to be realistic for machine vision use. Essentially Moore’s Law caught up. This enabled most commercial customers to be able to do their machine vision applications using a simple PCI frame grabber and software running on the PC in a much more cost effective system. While the spatio-temporal image flow processing of the PIPE was very interesting to some in the research community, most industrial applications didn’t need it. Today I can write many of the same applications seen on this video using OpenCV and C++ (or Python). On my Intel I7-8700 motherboard, these apps can run at speeds close to what the PIPE was doing back then but with much higher resolution frames. I put my desktop PC together for a very reasonable sub $1K cost. Lesson learned, ignore Moore’s Law and its various corollaries at your peril! Another lesson learned, a major cost reduction will almost always beat an interesting technology even when there is also a small performance reduction. I had a professor tell me that the PIPE was really cool, but that he could get 10 Sun-3 systems and have 10 students working at the same time for the cost of one PIPE. Even though the Sun-3’s were not real time, he thought that the increased student utilization was worth it. While it might be interesting to contemplate putting all the PIPE’s functionality into a few large FPGA’s, there is very little you can create in this way that could easily compete with close to free.
I think a second reason PIPE 2 never got developed had to do with the fall of the Berlin Wall. Many of the PIPE customers got their funding from various agencies of the US Defense Department. During the Reagan era, there seemed to be lots of money available. After the Wall fell and the Soviet Union broke up, for all intents and purposes, the Cold War ended and these kinds of funding sources dried up.
It was a lot of blood, sweat, and heartache, but also fun while it lasted!
The PIPE had many attributes that were really useful for temporal computer vision processing. Today, most computer vision applications for industrial machine vision or for Convolutional NN’s, and Deep NN’s do not use or take much advantage of the temporal domain. I believe that hot contemporary applications like vision for self driving cars for example could benefit a lot from using the rich information available in the temporal domain. Also I think that OpenCV could really benefit from making the temporal domain easier to set up and use.
Back in the 1980’s there was a lot of interest in special purpose attached processors. The PIPE was one, and there were many others with similar acronyms; PIFEX, Cytocomputer, and Warp (all mentioned above), PUMPS, PICAP, ZMOB, Vicom, Butterfly, FLIP, MPP, Pixar (yes that Pixar was originally hardware), HNC, and many others. These were all developed because general purpose computers were not fast enough to handle the massive amounts of high speed data involved in computer vision and graphics. Today, the latest desktop PCs and even the processors in smart phones are fast enough for many vision tasks. However, now there is a lot of renewed interest in special purpose processors that can handle certain tasks at much higher speeds than a general purpose CPU. At the Embedded Vision Summit 2019 conference, I heard that over the last 2 years, VC’s have invested more than $1.5B in special purpose vision, NN, and AI chip companies. The original 1980's/1990's backpropagation based neural networks typically had only 3 layers, an input layer, a hidden layer and an output layer. In the PAPNET cancer screening system [17] mentioned above, the NN operated on 32 x 32 chunks of monochrome pixels that were selected by the PIPE image processing as most likely to be containing the nucleus of a cell. The NN then determined if that image chunk most likely contained a cancer cell. The NN therefore had 32 x 32 = 1024 input neurons. The hidden layer was about 25% of the input or 256 neurons or so, and it had a single output neuron since it was just classifying how likely the 32 x 32 area had a cancer cell. Originally they used an attached “neurocomputer” (HNC Anza), but Moore’s Law caught up with that and later production versions used a standard computer instead.
In contrast, today’s Convolutional NN and Deep NN’s such as ResNet-50, AlexNet, and many others are different beasts altogether. These NN’s can be applied to operate on high resolution and/or color images. AlexNet uses 256 x 256 RGB input images, has 8 layers, 60M parameters, and 660K neurons. It was programmed to run on two Nvidia GPUs. ResNet-50 has 50 layers. These NN’s are only realistic on attached GPU’s, FPGA’s, or custom ASIC processors. Today's AI chip companies are quoting NN inference in the tera- and peta-ops range. So it seems that the battle between custom attached processors and the standard CPU has come full circle.
References
[1] Kent, US Patent 4,601,055, (1986).
[2] Kent, Shneier, and Lumia, “PIPE (Pipelined Image-Processing Engine)”, Journal of Parallel and Distributed Computing, V2, Issue 1, pp 50-78 (Feb. 1985).
[3] Luck, “PIPE: A Parallel Processor for Dynamic Image Processing”, Proc. SPIE V.758, (1987).
[4] Luck, “An Overview of the PIPE System”, Third Int’l Conference on Supercomputing: Supercomputing ‘88, Vol III, Boston, MA, (1988).
[5] Luck, “Translation, Scale, and Rotation Invariant Pattern Recognition Using PIPE”, Proc. Electronic Imaging East ‘88, (1988).
[6] Waxman, Wong, Goldenberg, and Bayle, “Robotic eye-head-neck motions and visual-navigation reflex learning using adaptive linear neurons”, Neural Networks, V1, Supplement 1, page 365, (1988).
[7] Baloch and Waxman, “A neural system for behavioral conditioning of mobile robots”, International Joint Conference on Neural Networks, (1990).
[8] Weiman and Chaiken, “Logarithmic spiral grids for image processing and display”, Computer Graphics and Image Processing, 11, (1979).
[9] Verghese, Gale, and Dyer, “Real-time motion tracking of three-dimensional objects”, Proceedings, IEEE International Conference on Robotics and Automation, 1990, pages 1998-2003.
[10] Horn and Schunck, “Determining Optical Flow”, Artificial Intelligence 17, (1981).
[11] van Santen and Sperling, “Elaborated Reichardt detectors”, JOSA, Vol. 2, No. 2, (1985).
[12] Adelson and Bergen, “Spatiotemporal energy models for the perception of motion”, JOSA A, Vol. 2. Issue 2, (1985).
[13] Luck, “PIPE, a parallel processor for dynamic image processing”, Proc. SPIE V.758, (1987).
[14] Luck, “ASPIPE: A Graphical User Interface for the PIPE System”, Proc. SPIE V.1076, (1989).
[15] Sternberg, “Parallel architectures for image processing”, Proc. 3rd International IEEE COMPSAC, pp. 712-717, (1978), (and numerous other subsequent articles and patents by Sternberg, Lougheed, and/or McCubbrey).
[16] Gennery and Wilcox, US Patent 4,790,026, (1988).
[17] Luck, Tjon, Mango, Recht, Lin, Knapp, "PAPNET: An Automated Cytology Screener using Image Processing and Neural Networks", Proc. SPIE 20th AIPR Workshop, V.1623, 161-171 (1991).
5-December-2019
updated 31-December-2019
updated 22-January-2020
updated 29-June-2022
The PIPE’s pixel, image and frame synchronous nature was both a significant benefit, but also in hindsight a limitation for some applications. Image size was fixed at 256 x 240, 8 bit precision, and 60 frames per second. This meant that the user did not need to deal with setting the frames up which was good, but limiting. After customer requests, extensions were added that allowed 512 x 480 image resolution at 15 Hz and 16 bit precision but this capability was not easy to use. For some kinds of algorithms like morphology, one might have wanted to cascade several erosions or dilations in sequence without incurring frame delays, but the architecture didn’t allow that. Everything that flowed into a MPS needed to pass through a frame delay. Pipelining up to 8 stages allowed the 60 Hz frame rate to remain at full speed, but there could be latency delays of up to 8 frames. If an algorithm needed more than 8 stages of processing, you could then add more time by slowing the processing down to 30, 20, 15, 10, 7.5, etc. Hz. Other contemporary vision processor architectures could avoid this issue. For example ERIM’s Cytocomputer [15] was intended to perform cascades of morphology operations within a frame time and it was really good for those kinds of inspection type algorithms, but it didn’t have the temporal optic flow capabilities that the PIPE had. Cal Tech’s PIFEX architecture [16] was in some ways more flexible because it used a more general cross bar architecture to connect its various processing hardware elements. I don’t think it was ever commercialized though. By the late 1980’s, we started to consider a next generation PIPE 2 architecture that would have combined the best features of the PIPE with cross bar connections similar to PIFEX. The PIPE was also contemporaneous with and in many respects, faster for many image processing applications than the more general purpose WARP systolic array processor from Carnegie Mellon University. An 8 stage PIPE system could run at up to 1.2 GOPS which was pretty good for the mid/late 1980's.
Aspex Incorporated ultimately sold 42 PIPE systems, a few with 1 MPS, but most either with 3 or 8. These went to university, government, and corporate research laboratories in the US and Asia. Some 10 systems were sold to Neuromedical Systems Inc. which used PIPE as the image processing front end for the first generation of their neural network based automated pap smear screening system, PAPNET described in their patents and this article [17].
One of the reasons the PIPE 2 never got developed was because ultimately something else did in the PIPE and all the other contemporaneous dedicated processors. By the early 1990’s, Intel 486 and Pentium processors and the new PCI bus got fast enough to be realistic for machine vision use. Essentially Moore’s Law caught up. This enabled most commercial customers to be able to do their machine vision applications using a simple PCI frame grabber and software running on the PC in a much more cost effective system. While the spatio-temporal image flow processing of the PIPE was very interesting to some in the research community, most industrial applications didn’t need it. Today I can write many of the same applications seen on this video using OpenCV and C++ (or Python). On my Intel I7-8700 motherboard, these apps can run at speeds close to what the PIPE was doing back then but with much higher resolution frames. I put my desktop PC together for a very reasonable sub $1K cost. Lesson learned, ignore Moore’s Law and its various corollaries at your peril! Another lesson learned, a major cost reduction will almost always beat an interesting technology even when there is also a small performance reduction. I had a professor tell me that the PIPE was really cool, but that he could get 10 Sun-3 systems and have 10 students working at the same time for the cost of one PIPE. Even though the Sun-3’s were not real time, he thought that the increased student utilization was worth it. While it might be interesting to contemplate putting all the PIPE’s functionality into a few large FPGA’s, there is very little you can create in this way that could easily compete with close to free.
I think a second reason PIPE 2 never got developed had to do with the fall of the Berlin Wall. Many of the PIPE customers got their funding from various agencies of the US Defense Department. During the Reagan era, there seemed to be lots of money available. After the Wall fell and the Soviet Union broke up, for all intents and purposes, the Cold War ended and these kinds of funding sources dried up.
It was a lot of blood, sweat, and heartache, but also fun while it lasted!
The PIPE had many attributes that were really useful for temporal computer vision processing. Today, most computer vision applications for industrial machine vision or for Convolutional NN’s, and Deep NN’s do not use or take much advantage of the temporal domain. I believe that hot contemporary applications like vision for self driving cars for example could benefit a lot from using the rich information available in the temporal domain. Also I think that OpenCV could really benefit from making the temporal domain easier to set up and use.
Back in the 1980’s there was a lot of interest in special purpose attached processors. The PIPE was one, and there were many others with similar acronyms; PIFEX, Cytocomputer, and Warp (all mentioned above), PUMPS, PICAP, ZMOB, Vicom, Butterfly, FLIP, MPP, Pixar (yes that Pixar was originally hardware), HNC, and many others. These were all developed because general purpose computers were not fast enough to handle the massive amounts of high speed data involved in computer vision and graphics. Today, the latest desktop PCs and even the processors in smart phones are fast enough for many vision tasks. However, now there is a lot of renewed interest in special purpose processors that can handle certain tasks at much higher speeds than a general purpose CPU. At the Embedded Vision Summit 2019 conference, I heard that over the last 2 years, VC’s have invested more than $1.5B in special purpose vision, NN, and AI chip companies. The original 1980's/1990's backpropagation based neural networks typically had only 3 layers, an input layer, a hidden layer and an output layer. In the PAPNET cancer screening system [17] mentioned above, the NN operated on 32 x 32 chunks of monochrome pixels that were selected by the PIPE image processing as most likely to be containing the nucleus of a cell. The NN then determined if that image chunk most likely contained a cancer cell. The NN therefore had 32 x 32 = 1024 input neurons. The hidden layer was about 25% of the input or 256 neurons or so, and it had a single output neuron since it was just classifying how likely the 32 x 32 area had a cancer cell. Originally they used an attached “neurocomputer” (HNC Anza), but Moore’s Law caught up with that and later production versions used a standard computer instead.
In contrast, today’s Convolutional NN and Deep NN’s such as ResNet-50, AlexNet, and many others are different beasts altogether. These NN’s can be applied to operate on high resolution and/or color images. AlexNet uses 256 x 256 RGB input images, has 8 layers, 60M parameters, and 660K neurons. It was programmed to run on two Nvidia GPUs. ResNet-50 has 50 layers. These NN’s are only realistic on attached GPU’s, FPGA’s, or custom ASIC processors. Today's AI chip companies are quoting NN inference in the tera- and peta-ops range. So it seems that the battle between custom attached processors and the standard CPU has come full circle.
References
[1] Kent, US Patent 4,601,055, (1986).
[2] Kent, Shneier, and Lumia, “PIPE (Pipelined Image-Processing Engine)”, Journal of Parallel and Distributed Computing, V2, Issue 1, pp 50-78 (Feb. 1985).
[3] Luck, “PIPE: A Parallel Processor for Dynamic Image Processing”, Proc. SPIE V.758, (1987).
[4] Luck, “An Overview of the PIPE System”, Third Int’l Conference on Supercomputing: Supercomputing ‘88, Vol III, Boston, MA, (1988).
[5] Luck, “Translation, Scale, and Rotation Invariant Pattern Recognition Using PIPE”, Proc. Electronic Imaging East ‘88, (1988).
[6] Waxman, Wong, Goldenberg, and Bayle, “Robotic eye-head-neck motions and visual-navigation reflex learning using adaptive linear neurons”, Neural Networks, V1, Supplement 1, page 365, (1988).
[7] Baloch and Waxman, “A neural system for behavioral conditioning of mobile robots”, International Joint Conference on Neural Networks, (1990).
[8] Weiman and Chaiken, “Logarithmic spiral grids for image processing and display”, Computer Graphics and Image Processing, 11, (1979).
[9] Verghese, Gale, and Dyer, “Real-time motion tracking of three-dimensional objects”, Proceedings, IEEE International Conference on Robotics and Automation, 1990, pages 1998-2003.
[10] Horn and Schunck, “Determining Optical Flow”, Artificial Intelligence 17, (1981).
[11] van Santen and Sperling, “Elaborated Reichardt detectors”, JOSA, Vol. 2, No. 2, (1985).
[12] Adelson and Bergen, “Spatiotemporal energy models for the perception of motion”, JOSA A, Vol. 2. Issue 2, (1985).
[13] Luck, “PIPE, a parallel processor for dynamic image processing”, Proc. SPIE V.758, (1987).
[14] Luck, “ASPIPE: A Graphical User Interface for the PIPE System”, Proc. SPIE V.1076, (1989).
[15] Sternberg, “Parallel architectures for image processing”, Proc. 3rd International IEEE COMPSAC, pp. 712-717, (1978), (and numerous other subsequent articles and patents by Sternberg, Lougheed, and/or McCubbrey).
[16] Gennery and Wilcox, US Patent 4,790,026, (1988).
[17] Luck, Tjon, Mango, Recht, Lin, Knapp, "PAPNET: An Automated Cytology Screener using Image Processing and Neural Networks", Proc. SPIE 20th AIPR Workshop, V.1623, 161-171 (1991).
5-December-2019
updated 31-December-2019
updated 22-January-2020
updated 29-June-2022
Generic Neighborhood and Scale Space processing
coming sometime!
coming sometime!
Copyright 2019 - 2022 ElectroLuck