In 2015, a research paper by Gatys, Ecker and Bethge posited that you could use a deep neural network to apply the artistic style of a painting to an existing image and get amazing results, as though the artist had rendered the image in question.
Soon after, a terrific and fun app was released to the app store called Prisma, which lets you do this on your phone.
How do they work?
There’s a comprehensive explanation of two different methods of Neural Style Transfer here on Medium; I won’t attempt to reproduce the explanation here, because he does such a thorough job. The author, Subhang Desai, explains that there are two basic approaches, the slow “optimization forward” approach (2015) and the much faster “feedforward” approach where styles are precomputed (2016.)
On the first “straightforward” approach, there are two main projects that I’ve found — one based on Pytorch and one based on Tensorflow. Frankly, I found the Pytorch-based project insanely difficult to configure on a Windows machine (I also tried on a Mac) — so many missing libraries and things that had to be compiled. The project was originally built for specific Linux-based configurations and made a lot of assumptions about how to get the local machine up and running.
But the second project (the one linked above) is based on Google’s Tensorflow library, and is much easier to set up, though from Github message board comments I conclude it’s quite a bit slower than the Pytorch-based project.
On-the-fly “Optimization” Approach
As Desai explains, the most straightforward approach is to do an on-the-fly paired learning of two images — the style image and the photograph.
The neural network learning algorithm pay attention to two loss scores, which it mathematically tries to minimize by adjusting weights:
- (a) How close the generated image is to the style of the artist, and
- (b) How close the generated image is to the original photograph.
In this way, by iterating multiple times over newly generated images, the code generates images that are similar to both the artistic style and the original image — that is, it renders details of the photograph in the “style” of the image.
I can confirm that this “optimization” approach — iterating through images takes a longtime. To get reasonable results, it took about 500+ iterations. The example image below took 1 hour and 23 minutes to render on a very fast CPU equipped with a 6Gb NVIDIA Titan 780 GPU.
I’ve used the neural-style transfer Tensorflow code written by Anish Athalye to transform this photo:
…and this artistic style:
…and, with 1,000 iterations, it renders this:
Faster “Feedforward” Precompute-the-Style Approach
The second and much faster approach is to precompute the filter based on artist styles (paper). That appears to be the way that Prisma works, since it’s a whole lot faster.
I’ve managed to get Pytorch installed and configured properly, and don’t need any of the luarocks dependencies and hassle of the main Torch library. In fact, a fast_neural_style transfer example is available via the Pytorch install, in the examples directory.
Wow! It worked in about 10 seconds (on Windows)!
Applying the image with the “Candy” artistic style rendered this image:
Here’s a Mosaic render:
…also took about 5 seconds or so. Amazing. The pre-trained model is so much faster! But on Windows, I had a devil of a time trying to get the actual training of new style models working.
Training New Models (new Artist Styles)
This whole project (as well as other deep learning and data science projects) inspired me to get a working Ubuntu setup going. After a couple hours, I’ve successfully gotten an Ubuntu 18.04 setup, and I’m dual-booting my desktop machine.
The deep learning community and libraries are mostly Linux-first.
After setting up Ubuntu on an NVIDIA-powered machine, installing PyTorch and various libraries, I can now run the faster version of this neural encoding.
Training “Red Balloon” by Paul Klee
To train a new model, you have to take a massive set of input training images, a “style” painting, and you tell the script to effectively “learn the style”. This iteratively tries to minimize the weighted losses between the original input image and output image and the “style” image and the output image.
During the training of new models (by default, two “epochs”, or iterations through the image dataset), you can see the loss score for content and style (as well as a weighted total). Notice that the total is declining on the right — the result of the training using gradient-descent in successive iterations to minimize the overall loss.
I had to install CUDA, which is the machine learning parallel processing library written by the clever folks at NVIDIA. This allows tensor code (matrix math) to be parallelized, harnessing the incredible power of the GPU, dramatically speeding up the process. So far, CUDA is the de-facto “machine learning for the masses” GPU library; none of the other major graphics chip makers have widely used libraries.
Amazingly, once you have a trained artist-style model — which took about 3.5 hours per input style on my machine — each rendered image in the “style” of an artist takes about a second to render, as you can see in the demo video below. Cool!
For instance, I’ve “trained” the algorithm to learn the following style (Paul Klee’s Red Balloon):
And now, I can take any input image — say, this photo of the Space Needle:
And run it through the pytorch-based script, and get the following output image:
(One-time) Model training learning the “Paul Klee Red Balloon Style”: 3.4 hours
Application of Space Needle Transform: ~1 second
Learning from this style:
rendering the Eiffel Tower:
looks like this:
Training the Seurrat artist model took about 2.4 hours, but once done, it took about 2 seconds to render that stylized Eiffel Tower image.
I built a simple test harness in Angular with a Flask (Python) back-end to demonstrate these new trained models, and a bash script to let me train new models from images in a fairly hands-off way.
Note how fast the rendering is once the model is complete. Each image is generated on the fly from a Python-powered API based on a learned model, and the final images are not pre-cached:
Really very cool!
Steve’s an entrepreneur and software leader. Steve’s worked on consumer apps, online travel, games, relational databases, management consulting and telecom. He launched Alignvote in 2019, which helped Seattle voters find their best-match political candidates by indexing their existing on-the-record stances, matching them with voter’s own answers to those exact same questions. Alignvote also offered politicians the chance to elaborate on those views. Alignvote is on hiatus for now, but might return in a future election.
Politically, Steve is an independent, and has not registered for any political party. He believes in outcome-based transparent governance; he is a moderate who believes that progressive approaches can be great if truly outcome-focused and evidence-driven, but also that unaccountable spending is a recipe for corruption and little progress. He believes that Seattle’s municipal government must work well for all 724,000+ Seattleites.
Steve’s founded multiple companies. In the early 2000’s, he founded BigOven, the first recipe app for iPhone, with more than 15 million downloads, which was purchased in 2018. Steve served as Chairman of Escapia Inc., the leading SaaS solution for the US vacation rental industry, sold to Homeaway, now part of Expedia. In 1997, Steve was cofounder, President, CEO and Chairman of VacationSpot, a pioneer in the online reservation of vacation rentals, bought by Expedia in January 2000. At Expedia, Steve was Vice President of Vacation Packages, leading the vacation package and destination services teams, helping to create two patents on the first-ever dynamic vacation packaging system on the Internet, which now represents billions in annual transactions for Expedia.
He has keynoted on several occasions at the Vacation Rental Managers Association (VRMA), and taught a graduate level course on the strategic management of innovation at the University of Washington Foster Business School in Seattle, Washington.
Steve worked for Microsoft from 1991 to 1997 in a variety of senior marketing and executive positions, and led the creation of the internet games group, helping develop several products and patents related to online multiplayer gaming. He helped launch Microsoft Access and was involved in the acquisition of Fox Software by Microsoft in 1993. He’s worked for IBM, Booz-Allen Hamilton and Bell Communications Research.
He holds an MS in Computer Science from Stanford University in Symbolic and Heuristic Computation (AI), an MBA from Harvard Business School, where he was named a George F. Baker Scholar (awarded to top 5% of graduating class), and a dual BS in Applied Mathematics / Computer Science and Industrial Management from Carnegie Mellon University (CMU) with University Honors. Steve volunteers when time allows with Habitat for Humanity, University District Food Bank, YMCA Seattle, Technology Access Foundation (TAF) and other organizations in Seattle.