Working with Environment Variables (Tech Note)

Here’s a quick cheatsheet on setting and reading environment variables across common OS’s and languages.

When developing software, it’s good practice to put anything you don’t wish to be public, as well as anything that’s “production-environment-dependent” into environment variables. These stay with the local machine. This is especially true if you are ever publishing your code to public repositories like Github or Dockerhub.

Good candidates for environment variables are things like database connections, paths to files, etc. Hosting platforms like Azure and AWS also let you easily set the value of variables on production and testing instances.

I switch back and forth between Windows, OSX and even Linux during development. So I wanted a quick cheatsheet on how to do this.

Writing Variables

Mac OSX (zsh)

The default shell for OSX is now zshell, not bash. If you’re still using bash, consider upgrading, and consider using the great utility “Oh My Zsh.”

will print out environment variables

To save new environment variables:

export BASEBALL_TEAM=Seattle Mariners

To save these permanently, you’ll want to save these values to:

~/.zshenv

So, in terminal you’d bring up the editor:

and add export BASEBALL_TEAM=Seattle Mariners to this file. Be sure to restart a new terminal instance for this to take effect, because ~./zshenv is only read when a new shell instance is created.

bash shell (Linux, older Macs, and even Windows for some users)

export BASEBALL_TEAM=Seattle Mariners
​
echo $BASEBALL_TEAM
Seattle Mariners 
​
printenv 
< prints all environment variables > 
​
// permanent setting
sudo nano ~/.bashrc
// place the export BASEBALL_TEAM=Seattle Mariners in this file
// restart a new bash shell 

Windows

  • Right click the Windows icon and select System
  • In the settings window, under related settings, click Advanced System Settings
  • On the Advanced tab, click Environment variables
  • Click New to create a new variable and click OK

Dockerfile

ENV [variable-name]=[default-value]
​ 
ENV BASEBALL_TEAM=Seattle Mariners

Reading Variables

Python

import os
​
print(os.environ.get("BASEBALL_TEAM"))

Typescript / Javascript / Node

const db: string = process.env.BASEBALL_TEAM ?? ''

C#

var bestBaseballTeam = Environment.GetEnvironmentVariable("BASEBALL_TEAM");

Engines of Wow, Part III: Opportunities and Pitfalls

This is the third in a three-part series introducing revolutionary changes in AI-generated art. In Part I: AI Art Comes of Age, we traced back through some of the winding path that brought us to this point. Part II: Deep Learning and The Diffusion Revolution, 2014-present, introduced three basic methods for generating art via deep-learning networks: GANs, VAEs and Diffusion models.

But what does it all mean? What’s at stake? In this final installment, let’s discuss some of the opportunities, legal and ethical questions presented by these new Engines of Wow.

Opportunities and Disruptions

We suddenly have robots which can turn text prompts into relevant, engaging, surprising images for pennies in a matter of seconds. They can compete with custom-created art taking illustrators and designers days or weeks to create.

Anywhere an image is needed, a robot can now help. We might even see side-by-side image creation with spoken words or written text, in near real-time.

  • Videogame designers have an amazing new tool to envision worlds.
  • Bloggers, web streamers and video producers can instantly and automatically create background graphics to describe their stories.
  • Graphic design firms can quickly make logos or background imagery for presentations, and accelerate their work. Authors can bring their stories to life.
  • Architects and storytellers can get inspired by worlds which don’t exist.
  • Entire graphic novels can now be generated from a text script which describes the scenes without any human intervention. (The stories themselves can even be created by new Chat models from OpenAI and other players.)
  • Storyboards for movies, which once cost hundreds of thousands of dollars to assemble, can soon be generated quickly, just by ingesting a script.

It’s already happening. In the Midjourney chat room, user Marcio84 writes: “I wrote a story 10 years ago, and finally have a tool to represent its characters.” With a few text prompts, the Midjourney Diffusion Engine created these images for him for just a few pennies:

Industrial designers, too, have a magical new tool. Inspiration for new vehicles can appear by the dozens and be voted up or down by a community:

Motorcycle concept generated by Midjourney, 2022

These engines are capable of competing with humans. In some surveys, as many as 87% of respondents incorrectly felt an AI-generated image was that of a real person. Think you can do better? Take the quiz.

I bet you could sell the art below, generated by Midjourney from a “street scene in the city, snow” prompt, in an art gallery or Etsy shop online. If I spotted it framed on a wall somewhere, or on a book cover or movie poster, I’d have no clue it was computer-generated:

img

A group of images stitched together becomes a video. One Midjourney user has tried to envision the aging destruction of a room, via successive videoframes of ever-more decaying description:

These are just a few of the things we can now do with these new AI art generation tools. Anywhere an image is useful, AI tools will have an impact, by lowering cost, blending concepts and styles, and envisioning many more options.

Where do images have a role? Well, that’s pretty much every field: architecture, graphic design, music, movies, industrial design, journalism, advertising, photography, painting, illustration, logo design, training, software, medicine, marketing, education and more.

Disruption

The first obvious impact is that many millions of employed or employable people may soon have far fewer opportunities.

Looking just at graphic design, approximately half a million designers employed globally, about 265,000 of whom are in the United States. (Source: Matt Moran of Colorlib.) The total market size for graphic design is about $43 billion per year. 90% of graphic designers work freelance, and the Fortune 500 accounts for nearly one-fifth of graphic design employment.

That’s just the graphic design segment. Then, there are photographers, painters, landscape architects and more.

But don’t count out the designers yet. These are merely tools, just as photography ones. And, while the Internet disrupted (or “disintermediated”) brokers in certain markets in the ’90s and ’00s (particularly in travel, in-person retail, financial services and real estate), I don’t expect that AI-generation tools means these experts are obsolete.

But the AI revolution is very likely to reduce the dollars available and reshape what their roles are. For instance, travel agents and financial advisers very much do still exist, though their numbers are far lower. The ones who have survived — even thrived — have used the new tools to imagine new businesses and have moved up the value-creation chain.

Who Owns the Ingested Data? Who Compensates the Artists?

Is this all plagiarism of sorts? There are sure to be lawsuits.

These algorithms rely upon massive image training sets. And there isn’t much dotting of i’s and crossing of t’s to secure digital rights. Recently, an artist found her own private medical records in one publicly available training dataset on the web which has been used by Stability AI. You can check to see if your own images have been part of the training datasets at www.haveibeenstrained.com.

But unlike most plagiarism and “derivative work” lawsuits up until about 2020, these lawsuits will need to contend with not being able to firmly identify just how the works are directly derivative. Current caselaw around derivative works generally requires some degree of sameness or likeness from input to final result. But the body of imagery which go into training the models is vast. A given creative end-product might be associated with hundreds of thousands or even millions of inputs. So how do the original artists get compensated, and how much, if at all?

No matter the algorithm, all generative AI models rely upon enormous datasets, as discussed in Part II. That’s their food. They go nowhere without creative input. And these datasets are the collective work of millions of artists and photographers. While some AI researchers go to great lengths to ensure that the images are copyright-free, many (most?) do not. Web scraping is often used to fetch and assemble images, and then a lot of human effort is put into data-cleansing and labeling.

The sheer scale of “original art” that goes into these engines is vast. It’s not unusual for a model to be trained by 5 million images. So these generative models learn patterns in art from millions of samples, not just by staring at one or two paintings. Are they “clones?” No. Are they even “derivative?” Probably, but not in the same way that George Harrison’s “My Sweet Lord” was derivative of Ronnie Mack’s “He’s So Fine.”

In the art world, American artist Jeff Koons created a collection called Banality, which featured sculptures from pop culture: mostly three dimensional representations of advertisements and kitsch. Fait d’Hiver (Fact of Winter) was one such work, which sold for approximately $4.3 million in a Christie’s auction in 2007:

Davidovici acknowledged that his sculpture was both inspired by and derived from this advertisement:

img

It’s plain to the eye the work is derivative.

And in fact, that was the whole point: Koons brought to three dimensions some of banality of everyday kitsch. In a legal battle spanning four years, Koons’ lawyers argued unsuccessfully that such derivative work was still unique, on several grounds. For instance, he had turned it into three dimensions, added a penguin and goggles on the woman, applied color, changed her jacket, the material representing snow, changed the scale, and much more.

While derivative, with all these new attributes, was the work not then brand new? The French court said non. Koons was found guilty in 2018. And it’s not the first time he was found guilty — in five lawsuits which sprang from this Banality collection, Koons lost three, and another settled out of court.

Unlike other “derivative works” lawsuits of the past, generative models in AI are relying not upon one work of a given artist, but an entire body of millions of images from hundreds of thousands of creators. Photographs are often lumped in with artistic sketches, oil paintings, graphic novel art and more to fashion new styles.

And, while it’s possible to look into the latent layers of AI models and see vectors of numbers, it’s impossible to translate that into something akin to “this new image is 2% based on image A64929, and 1.3% based on image B3929, etc.” An AI model learns patterns from enormous datasets, and those patterns are not well articulated.

Potential Approaches

It would be possible, it seems to me, to pass laws requiring that AI generative models use properly licensed (i.e., copyright-free or royalty-paid images), and then divvy up that royalty amongst its creators. Each artist has a different value for their work, so presumably they’d set the prices and AI model trainers would either pay for those or not.

Compliance with this is another matter entirely; perhaps certification technologies would offer valid tokens once verifying ownership. Similar to the blockchain concept, perhaps all images would have to be traceable to some payment or royalty agreement license. Or perhaps the new technique of Non Fungible Tokens (NFTs) can be used to license out ownership for ingestion during the training phase. Obviously this would have to be scalable, so it suggests automation and a de-facto standard must emerge.

Or will we see new kinds of art comparison or “plagiarism” tools, letting artists compare similarity and influence between generated works and their own creation? Perhaps if a generated piece of art is found to be more than 95% similar (or some such threshold) to an existing work, it will not retain copyright and/or require licensing of the underlying work. It’s possible to build such comparative tools today.

In the meantime, it’s the Wild West of sorts. As has often happened in the past, technology’s rapid pace of advancement has gotten ahead of where legislation and currency and monetary flow is.

What’s Ahead

If you’ve come with me on this journey into AI-generated art in 2022, or have seen these tools up close, you’re like the viewer who’s seen the world wide web in 1994. You’re on the early end of a major wave. This is a revolution in its nascent stage. We don’t know all that’s ahead, and the incredible capabilities of these tools are known only to a tiny fraction of society at the moment. It is hard to predict all the ramifications.

But if prior disintermediation moments are any guide, I’d expect change to happen along a few axes.

First, advancement and adoption will spread from horizontal tools to many more specialized verticals. Right now, there’s great advantage to being a skilled “image prompter.” I suspect that like photography, which initially required extreme expertise to create even passing mundane results, the engines will get better at delivering remarkable images the first pass through. Time an again in technology, generalized “horizontal” applications have concentrated into an oligopoly market of a few tools (e.g., spreadsheets), yet also launched a thousand flowers in much more specialized “vertical” ones (accounting systems, vertical applications, etc.) I expect the same pattern here. These tools have come of age, but only a tiny fraction of people know about them. We’re still in the generalist period. Meanwhile, they’re getting better and better. Right now, these horizontal applications stun with their potential, and are applicable to a wide variety of uses. But I’d expect thousands of specialty, domain specific applications and brand-names to emerge (e.g., book illustration, logo design, storyboard design, etc.) One set of players might generate sketches for creating book covers. Another for graphic novels. Another might specialize in video streaming backgrounds. Not only will this make the training datasets much more specific and outputs even more relevant, but it will allow the brand-names of each engine to better penetrate a specific customer base and respond to their needs. Applications will emerge for specific domains (e.g., automated graphic design for, say, blog posts.)

Second, many artists will demand compensation and seek to restrict rights to their work. Perhaps new guilds will emerge. A new technology and payments system might likely emerge to allow this to scale. Content generally has many ancillary rights, and one of those rights will likely be “ingestion rights” or “training model rights.” I would expect micropayment solutions or perhaps some form of blockchain-based technology to allow photographers, illustrators and artists to protect their work from being ingested into models without compensation. This might emerge as some kind of paywall approach to individual imagery. As is happening in the world of music, the most powerful and influential creative artists may initiate this trend, by cordoning off their entire collective body of work. For instance, the Ansel Adams estate might decide to disallow all ingestion into training models; right now however it’s very difficult to prove whether or not those images were used in the training of any datasets.

Third, regulation might be necessary to protect vital creative ecosystems. If an AI generative machine is able to create works auctioned at Christie’s for $5 million, and it may well soon, what does this do to the ecosystem of creators? It is likely necessary for regulators to protect the ecosystem for creators which feeds derivative engines, restricting AI generative model-makers from just fetching and ingesting any old image.

Fourth, in the near-term, skilled “image prompters” are like skilled photographers, web graphic designers, or painters. Today, there is a noticeable difference between those who know how to get the most of these new tools from those who do not. For the short term, this is likely to “gatekeep” this technology and validate the expertise of designers. I do not expect to this to be extremely durable, however; the quality of output from very unskilled prompters (e.g., yours truly) now meets or exceeds a lot of royalty-free art that’s out there from the likes of Envato or Shutterstock.

Conclusion

Machines now seem capable of visual creativity. While their output is often stunning, under the covers, they’re just learning patterns from data and semi-randomly assembling results. The shocking advancements since just 2015 suggest much more change is on the way: human realism, more styles, video, music, dialogue… we are likely to see these engines pass the artistic “Turing test” across more dimensions and domains.

For now, you need to be plugged into geeky circles of Reddit and Discord to try them out. And skill in crafting just the right prompts separates talented jockeys from the pack. But it’s likely that the power will fan out to the masses, with engines of wow built directly into several consumer end-user products and apps over the next three to five years.

We’re in a new era, where it costs pennies to envision dozens of new images visualizing anything from text. Expect some landmark lawsuits to arrive soon on what is and is not derivative work, and whether such machine-learning output can even be copyrighted. For now, especially if you’re in a creative field, it’s good advice to get acquainted with these new tools, because they’re here to stay.

Engines of Wow: Part II: Deep Learning and The Diffusion Revolution, 2014-present

A revolutionary insight in 2015, plus AI work on natural language, unleashed a new wave of generative AI models.

In Part I of this series on AI-generated art, we introduced how deep learning systems can be used to “learn” from a well-labeled dataset. In other words, algorithmic tools can “learn” patterns from data to reliably predict or label things. Now on their way to being “solved” via better and better tweaks and rework, these predictive engines are magical power-tools with intriguing applications in pretty much every field.

Here, we’re focused on media generation, specifically images, but it bears a note that many of the same basic techniques described below can apply to songwriting, video, text (e.g., customer service chatbots, poetry and story-creation), financial trading strategies, personal counseling and advice, text summarization, computer coding and more.

Generative AI in Art: GANs, VAEs and Diffusion Models

From Part I of this series, we know at a high level how we can use deep-learning neural networks to predict things or add meaning to data (e.g., translate text, or recognize what’s in a photo.) But we can also use deep learning techniques to generate new things. This type of neural network system, often comprised of multiple neural networks, is called a Generative Model. Rather than just interpreting things passively or searching through existing data, AI engines can now generate highly relevant and engaging new media.

How? The three most common types of Generative Models in AI are Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs) and Diffusion Models. Sometimes these techniques are combined. They aren’t the only approaches, but they are currently the most popular. Today’s star products in art-generating AI are Midjourney by Midjourney.com (Diffusion-based) DALL-E by OpenAI (VAE-based), and Stable Diffusion (Diffusion-based) by Stability AI. It’s important to understand that each of these algorithmic techniques were conceived just in the past 6 years or so.

My goal is to describe these three methods at a cocktail-party chat level. The intuition behind them are incredibly clever ways of thinking about the problem. There are lots of resources on the Internet which go much further into each methodology, listed at the end of each section.

Generative Adversarial Networks

The first strand of generative-AI models, Generative Adversarial Networks (GANs), have been very fruitful for single-domain image generation. For instance, visit thispersondoesnotexist.com. Refresh the page a few times.

Each time, you’ll see highly* convincing images like this, but never the same one twice:

As the domain name suggests, these people do not exist. This is the computer creating a convincing image, using a Generative Adversarial Network (GAN) trained to construct a human-like photograph.

*Note that for the adult male, it only rendered half his glasses. This GAN doesn’t really understand the concept of “glasses,” simply a series of pixels that need to be adjacent to one another.

Generative Adversarial Networks were introduced in a 2014 paper by Ian Goodfellow et al. That was just eight years ago! The basic idea is that you have two deep-learning neural networks: a Generator and a Discriminator. You can think of them like a Conterfeiter and a Detective respectively. One Deep Learning model, serving as the “Discriminator” (Detective), learns to distinguish between genuine articles and counterfeits. It penalizes the generator for producing implausible results. Meanwhile, a Generator model learns to “generate” plausible data, which, if it “fools” the discriminator, becomes negative training data for the Discriminator. They play a zero-sum game against each other (thus it’s “adversarial”) thousands and thousands of times, and with each adjustment to the Generator and Discriminator’s weights and attributes, the Generator gets better and better at “learning” how to construct something to fool the Discriminator, and the Discriminator gets better and better at detecting fakes.

The whole system looks like this:

Generative Adversarial Network, source: Google

GANs have delivered pretty spectacular results, but in fairly narrow domains. For instance, GANs have been pretty good at mimicking artistic styles (called “Neural Style Transfer“) and Colorizing Black and White Images.

GANs are cool and a major area of generative AI research.

More reading on GANs:

Variational Autoencoders (VAE)

An encoder can be thought of as a compressor of data, and a decompressor, something which does this opposite. You’ve probably compressed an image down to a smaller size without losing recognizability. It turns out you can use AI models to compress an image. Data scientists call this reducing its dimensionality.

What if you built two neural network models, an Encoder and a Decoder? It might look like this, going from x, the original image, to x’, the “compressed and then decompressed” image:

Variational Autoencoder, high-level diagram. Images go in on left, and come out on right. If you train. the networks to minimize the difference between output and input, you get to a compression algorithm of sorts. What’s left in red are lower-dimension representation of the images.

So conceptually, you could train an Encoder neural network to “compress” images into vectors, and then a Decoder neural network to “decompress” the image back into something close to the original.

Then, you could consider the red “latent space” in the middle as basically the rosetta stone for what a given image means. Run that algorithm numerous times over multiple images, encoding it with the text of the labeled images, and you would end up with the condensed encoding of how to render various images. If you did this across many, many images and subjects, these numerous red vectors would overlap in n-dimensional space, and could be sampled and mixed and then run through the decoder to generate images.

With some mathematical tricks (specifically, forcing the latent variables in red to conform to a normal distribution), you can build a system which can generate images that never existed before, but which have some very similar properties to the dataset which was used to train the encoder.

More reading on VAEs:

2015: “Diffusion” Arrives

Is there another method entirely? What else could you do with a deep learning system which can “learn” how to predict things?

In March 2015, a revolutionary paper came out from researchers Sohl-Dickstein, Weiss, Maheswaranathan and Ganguli. It was inspired by the physics of non-equilibrium systems: for instance, dropping a drop of food coloring into a glass of water. Imagine you saw a film of that process of “destruction”, and could stop it frame by frame. Could you build a neural network to reliably predict what a reverse might look like?

Let’s think about a massive training set of animal images. Imagine you take an image in your training dataset, and create multiple copies of the image, each time systematically adding graphic “noise” to it. Step by step, more noise is added to your image (x), via what mathematicians call a Markov chain (incremental steps.) You apply a normally-distributed distortion, let’s say, Gaussian Blur.

In a forward direction, from left to right, it might look something like this. At each step from left to right, you’re going from data (the image) to pure noise:

Adding noise to an image, left to right. Credit: image from “AI Summer”: How diffusion models work: the math from scratch | AI Summer (theaisummer.com)

But here’s the magical insight behind Diffusion models. Once you’ve done this, what if you trained a deep learning model to try to predict frames in the reverse direction? Could you predict a “de-noised” image X(t) from its more noisier version, X(t+1)? Could you could read each step backward, from right to left, and try to predict the best way to remove noise at each step?

This was the insight in the 2015 paper, albeit with much more mathematics behind it. It turns out you can train a deep learning system to learn how to “undo” noise in an image, with pretty good results. For instance, if you input the pure-noise image in the last step, x(T), and train a deep learning network that its output should be the previous step x(T-1), and do this over and over again with many images, you can “train” a deep learning network to subtract noise in an image, all the way back to an original image.

Do this enough times, with enough terrier images, say. And then, ask your trained model to divine a “terrier” from random noise. Gradually, step by step, it removes noise from an image to synthesize a “terrier”, like this:

Screen captured video of using the Midjourney chatroom (on Discord) to generate: “terrier, looking up, cute, white background”

Images generated from the current Midjourney model:

“terrier looking up, cute, white background” entered into Midjourney. Unretouched, first-pass output with v3 model.

Wow! Just slap “No One Hates a Terrier” on any of these images above, print 100 t-shirts, and sell it on Amazon. Profit! I’ll touch on some of the legal and ethical controversies and ramifications in the final post in this series.

Training the Text Prompts: Embeddings

How did Midjourney know to produce a “terrier”, and not some other object or scene or animal?

This relied upon another major parallel track in deep learning: natural language processing. In particular, word “embeddings” can be used to get from keywords to meanings. And during the image model training, these embeddings were applied by Midjourney to enhance each noisy-image with meaning.

An “embedding” is a mapping of a chunk of text into a vector of continuous numbers. Think about a word as a list of numbers. A textual variable could be a word or a node in a graph, or a relation between nodes in a graph. By ingesting massive amounts of text, you can train a deep learning network to understand relationships between words and entities, and numerically pull out how closely associated some words and phrases are with others. They can be used to cluster together the sentiment of an expression in mathematical terms a computer can appear to understand. For instance, embedding models are now able to interpret semantics and relationships between words, like “royalty + woman – man = queen.”

An example on Google Colab took a vocabulary of 50,000 words in a collection of movie reviews, and learned over 100 different attributes from words used with them, based on their adjacency to one another:

img

Source: Movie Sentiment Word Embeddings

So, if you simultaneously injected into the “de-noising” diffusion-based learning process the information that this is about a “dog, looking up, on white background, terrier, smiling, cute,” you can get a deep learning network to “learn” how to go from random noise (x(T)) to a very faint outline of a terrier (x(T-1)), to even less faint (x(T-2)) and so on, all the way back to x(0). If you do this over thousands of images, and thousands of keyword embeddings, you end up with a neural network that can construct an image from some keywords.

Incidentally, researchers have found that about T=1000 is about all you need in this process, but millions of input images and enormous amounts of computing power are needed to learn how to “undo” noise at high resolution.

Let’s step back a moment to note that this revelation about Diffusion Models was only really put forward in 2015, and improved upon in 2018 and 2020. So we are just at the very beginning of understanding what might be possible here.

In 2021, Dhariwal and Nichol convincingly note that diffusion models can achieve image quality superior to the existing state-of-the-art GAN models.

Up next, Part III: Ramifications and Questions

That’s it for now. In the final Part III of Engines of Wow, we’ll explore some of the ramifications, controversies and make some predictions about where this goes next.

Engines of Wow: AI Art Comes of Age

Advancements in AI-generated art test our understanding of human creativity and laws around derivative art.

While most of us were focused on Ukraine, the midterm elections, or simply returning to normal as best we can, Artificial Intelligence (AI) took a gigantic leap forward in 2022. Seemingly all of a sudden, computers are now eerily capable of human-level creativity. Natural language agents like GPT-3 are able to carry on an intelligent conversation. GitHub CoPilot is able to write major blocks of software code. And new AI-assisted art engines with names like Midjourney, DALL-E and Stable Diffusion delight our eyes, but threaten to disrupt entire creative professions. They raise important questions about artistic ownership, derivative work and compensation.

In this three-part blog series, I’m going to dive in to the brave new world of AI-generated art. How did we get here? How do these engines work? What are some of the ramifications?

This series is divided into three parts:

[featured image above: “God of Storm Clouds” created by Midjourney AI algorithm]

But first, why should we care? What kind of output are we talking about?

Let’s try one of the big players, the Midjourney algorithm. Midjourney lets you play around in their sandbox for free for about 25 queries. You can register for free at Midjourney.com; they’ll invite you to a Discord chat server. After reading a “Getting Started” agreement and accepting some terms, you can type in a prompt. You might go with: “/imagine portrait of a cute leopard, beautiful happy, Gryffindor outfit, super detailed, hyper realism.”

Wait about 60 seconds, choose one of the four samples generated for you, click the “upscale” button for a bigger image, and voila:

image created by the Midjourney image generation engine, version 4.0. Full prompt used to create it was “portrait of a cute leopard, Beautiful happy, Gryffindor Outfit, white background, biomechanical intricate details, super detailed, hyper realism, heavenly, unreal engine, rtx, magical lighting, HD 8k, 4k”

The Leopard of Gryffindor was created without any human retouching. This is final Midjourney output. The algorithm took the text prompt, and then did all the work.

I look at this image, and I think: Stunning.

Looking at it, I get the kind of “this changes everything” feeling, like the first time I browsed the world-wide web, spoke to Siri or Alexa, rode in an electric vehicle, or did a live video chat with friends across the country for pennies. It’s the kind of revolutionary step-function that causes you to think “this will cause a huge wave of changes and opportunities,” though it’s not even clear what they all are.

Are artists, graphic designers and illustrators doomed? Will these engines ultimately help artists or hurt them? How will the creative ecosystem change when it becomes nearly free to go from idea to visual image?

Once mainly focused at just processing existing images, computers are now extremely capable at generating brand new things. Before diving into a high-level overview of these new generative AI art algorithms, let me emphasize a few things: First, no artist has ever created exactly the above image before, nor will it likely be generated again. That is, Midjourney and its competitors (notably DALL-E and Stable Diffusion) aren’t search engines: they are media creation engines.

In fact, if you typed this same exact prompt into Midjourney again, you’ll get an entirely different image, yet one which also is likely to deliver on the prompt fairly well.

There is an old joke within Computer Science circles that “Artificial Intelligence is what we call things that aren’t working yet.” That’s now sounding quaint. AI is all around us, making better and better recommendations, completing sentences, curating our media feeds, “optimizing” the prices of what we buy, helping us with driving assistance on the road, defending our computer networks and detecting spam.

Part I: The Artists in the Machine, 1950-2015+

How did this revolutionary achievement come about? Two ways, just as bankruptcy came about for Mike Campbell in Hemingway’s The Sun Also Rises: First gradually. Then suddenly.

Computer scientists have spent more than fifty years trying to perfect art generation algorithms. These five decades can be roughly divided into two distinct eras, each with entirely different approaches: “Procedural” and “Deep Learning.” And, as we’ll see in Part II, the Deep-Learning era had three parallel but critical deep learning efforts which all converged to make it the clear winner: Natural Language, Image Classifiers, and Diffusion Models.

But first, let’s rewind the videotape. How did we get here?

Procedural Era: 1970’s-1990’s

If you asked most computer users, the naive approach to generating computer art would be to try to encode various “rules of painting” into software programs, via the very “if this then that” kind of logic that computers excel at. And that’s precisely how it began.

In 1973, American computer scientist Harold Cohen, resident at Stanford University’s Artificial Intelligence Laboratory (SAIL) created AARON, the first computer program dedicated to generating art. Cohen was actually an accomplished, talented artist and a computer scientist. He thought it would be intriguing to try to “teach” a computer how to draw and paint.

His thinking was to encode various “rules about drawing” into software components and then have them work together to compose a complete piece of art. Cohen relied upon his skill as an exceptional artist, and coded his own “style” into his software.

AARON was an artificial intelligence program first written in the C programming language (a low level language compiled for speed), and later LISP (a language designed for symbolic manipulation.) AARON knew about various rules of drawing, such as how to “draw a wavy blue line, intersecting with a black line.” Later, constructs were added to combine these primitives together to “draw an adult human face, smiling.” By 1995, Cohen added rules for painting color within the drawn lines.

Though there were aspects of AARON which were artificially intelligent, by and large computer scientists call his a procedural approach. Do this, then that. Pick up a brush, pick an ink color, and draw from point A to B. Construct an image from its components. Join the lines. And you know what? after a few decades of work, Cohen created some really nice pieces, worthy of hanging on a wall. You can see some of them at the Museum of Computer History in Menlo Park, California.

In 1980, AARON was able to generate this:

Detail from an untitled AARON drawing, ca. 1980.
Detail from an untitled AARON drawing, ca. 1980, via Computer History Museum

By 1995, Cohen had encoded rules of color, and AARON was generating images like this:

img
The first color image created by AARON, 1995. via Computer Museum, Boston, MA

Just a few months ago, other attempts at AI-generated art were flat-looking and derivative, like this image from 2019:

img

Twenty seven years after AARON’s first AI-generated color painting, algorithms like Midjourney would be quickly rendering photorealistic images from text prompts. But to accomplish it, the primary method is completely different.

Deep Learning Era (1986-Present)

Algorithms which can create photorealistic images-on-demand are the culmination of multiple parallel academic research threads in learning systems dating back several decades.

We’ll get to the generative models which are key to this new wave of “engines of wow” in the next post, but first, it’s helpful to understand a bit about their central component: neural networks.

Since about 2000, you have probably noticed everyday computer services making massive leaps in predictive capabilities; that’s because of neural networks. Turn on Netflix or YouTube, and these services will serve up ever-better recommendations for you. Or, literally speak to Siri, and she will largely understand what you’re saying. Tap on your iPhone’s keyboard, and it’ll automatically suggest which letters or words might follow.

Each of these systems rely upon trained prediction models built by neural networks. And to envision them, a branch of computer scientists and mathematicians had to radically shift their thinking from the procedural approach. A branch of them did so first in the 1950’s and 60’s, and then again in a machine-learning renaissance which began in earnest in the mid-1980’s.

The key insight: these researchers speculated that instead of procedural coding, perhaps something akin to “intelligence” could be fashioned from general purpose software models, which would algorithmically “learn” patterns from a massive body of well-labeled training data. This is the field of “machine learning,” specifically supervised machine learning, because it’s using accurately pre-labeled data to train a system. That is, rather than “Computer, do this step first, then this step, then that step”, it became “Computer: learn patterns from this well-labeled training dataset; don’t expect me to tell you step-by-step which sequence of operations to do.”

The first big step began in 1958. Frank Rosenblatt, a researcher at Cornell University, created a simplistic precursor to neural networks, the “Perceptron,” basically a one-layer network consisting of visual sensor inputs and software outputs. The Perceptron system was fed a series of punchcards. After 50 trials, the computer “taught” itself to distinguish those cards which were marked on the left from cards marked on the right. The computer which ran this program was a five-ton IBM 704, the size of a room. By today’s standards, it was an extremely simple task, but it worked.

A single-layer perceptron is the basic component of a neural network. A perceptron consists of input values, weights and a bias, a weighted sum and activation function:

Frank Rosenblatt and the Perceptron system, 1958

Rosenblatt described it as the “first machine capable of having an original idea.” But the Perceptron was extremely simplistic; it merely added up the optical signals it detected to “perceive” dark marks on one side of the punchcard versus the other.

In 1969, MIT’s Marvin Minsky, whose father was an eye surgeon, wrote convincingly that neural networks needed multiple layers (like the optical neuron fabric in our eyes) to really do complex things. But his book Perceptrons, though well-respected in hindsight, got little traction at the time. That’s partially because during these intervening decades, the computing power required to “learn” more complex things via multi-layer networks were out of reach computationally. But time marched on, and over the next three decades, computing power, storage, languages and networks all improved dramatically.

From the 1950’s through the early 1980’s, many researchers doubted that computing power would be sufficient for intelligent learning systems via a neural network style approach. Skeptics also wondered if models could ever get to a level of specificity to be worthwhile. Early experiments often “overfit” the training data and simply output the input data. Some would get stuck on local maxima or minima from a training set. There were reasons to doubt this would work.

And then, in 1986, Carnegie Mellon Professor Geoffrey Hinton, whom many consider the “Godfather of Deep Learning” (go Tartans!), demonstrated that “neural networks” could learn to predict shapes and words by statistically “learning” from a large, labeled dataset. Hinton’s revolutionary 1986 breakthrough was the concept of “backpropagation.” This adds both multiple layers to the model (hidden layers), and also iterations through the neural network model using the output of one or more mathematical functions to adjust weights to minimize “loss” or distance from the expected output.

This is rather like the golfer who adjusts each successive golf swing, having observed how far off their last shots were. Eventually, with enough adjustments, they calculate the optimal way to hit the ball to minimize its resting distance from the hole. (This is where terms like “loss function” and “gradient descent” come in.)

In 1986-87, around the time of the 1986 Hinton-Rumelhart-Williams paper on Backpropagation, the whole AI field was in flux between these procedural and learning approaches, and I was earning a Masters in Computer Science at Stanford, concentrating in “Symbolic and Heuristic Computation.” I had classes which dove into the AARON-style type of symbolic, procedural AI, and a few classes touching on neural networks and learning systems. (My masters thesis was in getting a neural network to “learn” how to win the Tower of Hanoi game, which requires apparent backtracking to win.)

In essence, you can think of a neural network as a fabric of software-represented units (neurons) waiting to soak up patterns in data. The methodology to train them is: “here is some input data and the output I expect, learn it. Here’s some more input and its expected output, adjust your weights and assumptions. Got it? Keep updating your priors. OK, let’s keep doing that.” Like a dog learning what “sit” means (do this, get a treat / don’t do this, don’t get a treat), neural networks are able to “learn” over iterations, by adjusting the software model’s weights and thresholds.

Do this enough times, and what you end up with is a trained model that’s able to “recognize” patterns in the input data, outputting predictions, or labels, or anything you’d like classified.

A neural network, and in particular the special type of multi-layered network called a deep learning system, is “trained” on a very large, well-labeled dataset (i.e., with inputs and correct labels.) The training process uses Hinton’s “backpropagation” idea to adjust the weights of the various neuron thresholds in the statistical model, getting closer and closer to “learning” the underlying pattern in the data.

For much more detail on Deep Learning and the mathematics involved, see this excellent overview:

Deep Learning Revolutionizes AI Art

We’ll rely heavily upon this background of neural networks and deep learning for Part II: The Diffusion Revolution. The AI revolution uses deep learning networks to interpret natural language (text to meaning), classify images, and “learns” how to synthetically build an image from random noise.

Gallery

Before leaving, here are a few more images created from text prompts on Midjourney:

You get the idea. We’ll check in on how deep learning enabled new kinds of generative approach to AI art, called Generative Adversarial Networks, Variable Autoencoders and Diffusion, in Part II: Engines of Wow: Deep Learning and The Diffusion Revolution, 2014-present.

I’m Winding Down HipHip.app

After much thought, I’ve decided to wind down the video celebration app I created, HipHip.app.

After much thought, I’ve decided to wind down the video celebration app I created, HipHip.app.

All servers will be going offline shortly.

Fun Project, Lots of Learning

I started HipHip as a “give back” project during COVID. I noticed that several people were lamenting online that they were going to miss big milestones in-person: celebrations, graduations, birthdays, anniversaries, and memorials. I had been learning a bunch about user-video upload and creation, and I wanted to put those skills to use.

I built HipHip.app, a celebration video creator. I didn’t actually know at the time that there were such services — and it turns out, it’s a pretty crowded marketplace!

While HipHip delivered hundreds of great videos for people in its roughly two years on the market, it struggled to be anything more than a hobby/lifestyle project. It began under the unique circumstances of lockdown, helping people celebrate. That purpose was well served!

Now that the lockdown/remote phase of COVID is over, the economics of the business showed that it’s unlikely to turn into a self-sustaining business any time soon. There are some category leaders that have really strong search engine presence which is pretty expensive to dislodge.

I want to turn my energies to other projects, and free up time and budget for other things. COVID lockdown is over, and a lot of people want a respite from recording and Zoom-like interactions, including me.

It was a terrific, educational project. It kept me busy, learning, and productive. HipHip delivered hundreds of celebration videos for people around the world.

I’ve learned a ton about programmatic video creation, technology stacks like Next.js, Azure and React, and likely will apply these learnings to new projects, or perhaps share them with others via e-learning courses.

Among the videos people created were graduation videos, videos to celebrate new babies, engagement, birthdays, anniversaries and the attainment of US Citizenship.

In the end, the CPU processing and storage required for online video creation meant that it could not be done for free forever, and after testing a few price-points, there seems only so much willingness to pay in a crowded market.

Thanks everyone for your great feedback and ideas!

Introducing Seattlebrief.com: Local Headlines, Updated Every 15 Minutes

I’ve built a news and commentary aggregator for Seattle. Pacific Northwest headlines, news and views, updated every 15 minutes.

Seattlebrief.com is now live in beta.

Its purpose is to let you quickly get the pulse of what Seattleites are writing and talking about. It rolls up headlines and commentary from more than twenty local sources from across the political spectrum.

It indexes headlines from places like Crosscut, The Urbanist, Geekwire, The Stranger, Post Alley, Publicola, MyNorthwest.com, City Council press releases, Mayor’s office press releases, Q13FOX, KUOW, KOMO, KING5, Seattle Times, and more. It’s also indexing podcasts and videocasts from the Pacific Northwest, at least those focused on civic, community and business issues in Seattle.

Seattle isn’t a monoculture. It’s a vibrant mix of many different people, many communities, neighborhoods, coalitions and voices. But there are also a lot of forces nudging people into filtered silos. I wanted to build a site which breaks away from that.

Day to day, I regularly hit a variety of news feeds and listen to a lot of different news sources. I wanted to make that much easier for myself and everyone in the city.

Seattlebrief.com is a grab-a-cup-of-coffee site. It is designed for browsing, very intentionally, not search. Click on the story you’re interested in, and the article will open up in a side window. It focuses on newsfeeds which talk about civic and municipal issues over sports, weather and traffic.

I’ll consider Seattlebrief.com a success if it saves you time catching up on local stories, or introduces you to more voices and perspectives in this great city.

How it works

There are so many interesting and important voices out there, from dedicated news organizations like The Seattle Times to more informal ones like neighborhood blogs. I wanted a quick way to get the pulse of what’s happening. Seattlebrief pulls from the RSS feeds of more than twenty local sites, from all sides of the political spectrum: news sites, neighborhood blogs, municipal government announcements, and activist organizations. The list will no doubt change over time.

Many blog sites and news organizations support Really Simple Syndication (RSS) to publish their latest articles for syndication elsewhere. For instance, you can find Post Alley’s RSS feed here. RSS is used to power Google News and podcast announcements, among other things.

RSS is a bunch of computer code which tells aggregation sites: “here are the recent stories,” usually including a photo thumbnail, author information, and description. Seattlebrief uses this self-declared RSS feed, currently from over 20 sources in and around Seattle. It regularly checks what’s new. Another job then fetches each page and “enriches” these articles with the social sharing metadata that is used to mark up the page for, say, sharing on Facebook or Twitter. 

Think of it as a robot that simply goes out to a list of sites and fetches the “social sharing” info for each of them, then puts them in chronological order (by way of publication date) for you. The list of sites Seattlebrief uses will no doubt change over time.

Origin

Over at Post Alley, where I sometimes contribute, there was a writers’ room discussion about the Washington Post’s popular “Morning Mix” series. Morning Mix highlights interesting/viral stories around the web.

Sparked by that idea, I wanted to build a way to let me browse through the week’s Seattle-area headlines and commentary more easily. So I built Seattlebrief.

I’d welcome any key sources I’ve missed. Right now, they must have an RSS feed. And regrettably, some important/thoughtful voices like KUOW have long ago decommissioned their RSS feeds. I’m exploring what might be possible there.

Drop me a note.

I’d love it if you checked out Seattlebrief.com, and let me know your thoughts.

Farewell, Facebook

The time has come to de-prioritize Facebook in my life. Here are some of the steps I’m taking if you too are considering it.

Happy New Year 2022!

I’ve decided to deprioritize Facebook in my life. I made this decision back in autumn, but decided to stick it out to be able to engage with people up to and through Seattle’s recent elections.

Engaging on Facebook has taken up more time than I care to admit over the past several years. I joined Facebook in 2007, three years after its founding. During that year, I invited a lot of friends to it. Over the ensuing thirteen years, I’ve made 5,387 posts, and uploaded over 2 gigabytes of photos and video to it.

From about 2014 onward, I’ve used Facebook as a journal of sorts. I’ve posted vacation photos and family updates. But unlike many people who wisely stay away from politics and controversy, I’ve also shared news items and articles and predictions which interest me, and on more than one occasion they’ve run against the grain of a very deep blue political sentiment among family and friends at the moment. I am a huge advocate of breaking one’s own filter-bubble, and I have felt that too many Americans have succumbed to an ever-narrower range of news sources.

I’ve really enjoyed hearing from friends and family on controversial issues, learning from perspectives which aren’t always my own. Put another way, areas of universal agreement are far less interesting to me. Since I always kept Facebook friends to true friends in real life (a cardinal rule throughout), these interactions have nearly always been incredibly respectful and polite. I’ve only had to unfriend one friend and former colleague, out of more than 400 Facebook friends. It was when I called the lab leak hypothesis by far the most credible to me, early on in the pandemic.

As early as late January 2020, before even the first American had died of COVID, I thought Occam’s Razor had something to say:

My friend took instant and strong offense, and considered this to be a racist viewpoint. Remember when polite society equated the lab leak hypothesis with racism? I found that odd then and still today — if anything, the “wet market” and “bat soup” explainers, which were among the original ones floated, seemed if anything the far more culturally-insensitive hypotheses. Mistakes happen all the time, even hugely consequential ones. Moreover I could easily envision myself as a well-intentioned and expert researcher, normally highly careful, inadvertently responsible for a very random or extremely rare accident or unthinking moment of carelessness. Look at Chernobyl, or Three Mile Island, or the Exxon Valdez disasters — these were not intentional. And moreover it was vital to understand how they happened.

Today, most Americans believe a lab leak to be the most likely cause. It’s at least acceptable to discuss in polite company, even in places like New York Magazine. I’ve repeatedly stated that I don’t think it was intentional, if it was indeed an accident, but I’ve lost a friend over it. He kept jumping in with snide and insulting comments, going ad-hominem without ever bothering to engage in the actual substantial and growing circumstantial evidence, much of which 20+ year New York Times Science journalist Don McNeil eventually chronicled in a must-read piece, How I Learned to Stop Worrying And Love the Lab-Leak Theory.

Look, I’m independent. That brings incredible luxury. It means I don’t have to check my tribe’s opinion before voicing my own. And I don’t accept the fashionable rhetorical trick that just because one reprehensible person holds a given view, that anyone else holding such a view must buy into the panoply of their ideas. Adolf Hitler loved dogs, after all; this doesn’t mean dog owners must defend Mein Kampf. No, what matters are the facts and evidence, and the logic and merits of the argument. As a data-guy, evidence is essential to how I think.

Beyond the lab-leak hypothesis, I have had several at-the-time controversial or heterodox opinions over the past several years. I was posting about the high likelihood of a coming pandemic wave to my Facebook friends as early as the first week of February 2020, before the first reports of US infections. I’ve been skeptical about the efficacy of cloth mask mandates starting months ago, after trying and failing to find correlation between mask mandates and changes in spread. I’ve felt we are not doing enough to separate positives from worrisome positives, when few were discussing the idea that PCR tests might be over-sensitive, depending upon the number of cycle-thresholds run. I’ve been harshly critical of the harms of prolonged school closures. I’ve predicted significant inflation from unbridled easy monetary policy, and predicted inflation’s likely durability when we were repeatedly told it was “transitory.” These are just a few examples of discussions which first emerged on my own Facebook threads, and then sometimes headed to my blog. I’m far from infallible. But I think history is very much on my side with respect to each and every one of these once-highly-controversial but now generally accepted viewpoints. They certainly weren’t always what people wanted to hear at the time. At least not sprinkled amidst family updates and vacation and pet photos.

Though discussions like this are intriguing, I don’t think using Facebook in this way is necessarily the best pastime to be my healthiest in 2022 and beyond.

A big concern too is that Facebook (and absolutely, Twitter and YouTube) are narrowing the range of acceptable conversation, through what I consider to be highly undesirable censorship. And they’re harvesting data from us, and manipulating what’s shown to us which amplifies misinformation and can cause emotional harm.

But putting aside for a moment the very important data sharing/mining, censorship and manipulation concerns, there’s also the matter of using the right tool for the job. I really should be updating my blog more. To be sure, I got to the point where Facebook became a bit of a “here’s a controversial issue I’m thinking about” journal, which I would then, over the ensuing weeks or months, add evidence and articles to support my predictions and views to. One of my personal goals has been to write more. But more than once, my wife gently asked me “Um, why are you down in your office, replying to yourself on Facebook?”

Thinking about leaving Facebook too? Be sure to get a backup of everything you’ve posted. Go into Facebook’s account settings and request a download of your Facebook data. A day or so later, you’ll see a set of files you can download, in either HTML or JSON form. (Personally, I recommend JSON form, if you ever plan to export/import them into another tool in the future.)

While full deactivation and deletion of my account altogether is very tempting, I’ve still got a couple startup-specific reasons to not deactivate my Facebook account entirely. And I know that I have a few services out there with my Facebook login, so I want to be sure to leave it live for a few months as I change those.

So, here are the steps I’ve taken to introduce lots of friction into booting-up-Facebook:

  • I’ve deleted the Facebook and Messenger Apps from my phone
  • I’ve installed the excellent UnDistracted Chrome plugin to all the desktop and laptop browsers I use. Since I generally use Edge and Chrome, luckily this extension works on all the browsers where I spend 95% of my time.
  • I’ve signed out of Facebook on all browsers
  • When I sign into services requiring “Log In with Facebook”, I’m taking a moment to change the login method.

My friend Marcelo Calbucci has done a nice blog post on a 12-Step Program To Eliminate Facebook in Your Life if this is of interest to you.

Twitter’s New CEO and Freedom of Speech

What does the changing of the guard at Twitter suggest about free speech online?

This first appeared in Seattle’s Post Alley on December 9th, 2021

On its face, Jack Dorsey’s resignation as CEO of Twitter last week was just another rearrangement of the Silicon Valley furniture, albeit an outsized one, given Dorsey’s iconic stature. At a different level though, the move opened a new chapter in the debate about social media platforms, regulation, the future of the internet, and ultimately how we define and allow free speech.

As the Jack Dorsey era of Twitter Inc. came to a close, a ten-year Twitter veteran, Parag Agrawal, formerly its Chief Technical Officer, took the helm. Agrawal’s appointment portends a faster-moving company. But it may also signal an even more algorithmically filtered, boosted and suppressed conversation in the years to come.

How so?

Imagine you had an algorithm which could instantly calculate the health or danger of any given tweet or online conversation. For instance, it might look in on a substantive, respectful debate amongst career astrophysicists and assign positive scores of 4852325 and climbing, but evaluate a hateful, racist tirade at -2439492, trending lower and lower.

Wouldn’t such an algorithm be something you could use to make Twitter conversations healthier? Couldn’t it also be used to block bad actors and boost desired, good-faith discussion, thereby reducing harm and promoting peace? The man who once championed this tantalizing, risky idea within Twitter is its new Chief Executive Officer, Parag Agrawal.

So what, you say. But therein lies a fierce and philosophical battle raging about how free speech should be defined, measured, protected and even suppressed online.

Twitter has become the dominant tool for the world’s real-time news consumers and purveyors — which in a sense, is all of us. It has been especially useful for journalists, particularly in the understanding of and reporting upon real-time news events. It’s become highly influential in politics and social change. And as a result, the CEO of the Internet’s real-time public square is far more than a tech executive. Ultimately, he and the people he appoints, pays, and evaluates, cast the deciding vote on consequential questions, such as: what news, political and scientific discussion are allowed to be consumed? What are we allowed to know is happening right now? Who determines what can be expressed, and what cannot? What gets boosted, and who gets booted?

Dorsey stepped down via an email to his team on November 29, 2021, posting it to where else but his Twitter account. He wrote that being “founder-led” can be a trap for a company, and he wanted Twitter to grow beyond him. He expressed “bone deep” trust and confidence in successor and friend, Agrawal.

But there were likely other reasons behind his decision too. Dorsey’s been public about wanting to spend more time promoting Bitcoin; he’s called being a Bitcoin missionary his “dream job.” There’s also the small matter that his other day-job has been running Block Inc., (formerly known as Square), the ubiquitous small-business payments processor, which is now worth more than $80 billion and employs more than 5,000. Adding to the incentive: Dorsey also owns a much bigger personal stake in Block than he does in Twitter.

A final, less discussed contributor might be simmering investor dissatisfaction. Twitter’s stock price is languishing in the mid $40’s, the same trading range it had eight years ago:

Snapshot taken December 9th, 2021

And its user numbers, while growing, have not shown the rapid growth many investors expect. Twitter’s user growth has been small compared to Facebook, Instagram and TikTok.

Activist investors Elliott Management and its ally Silver Lake Partners own significant stakes in Twitter, and they pushed for new leadership and faster innovation. According to Axios, while Elliott Management resigned its board seat in 2020, it demanded and got two things in return: new management, and a plan to increase the pace of innovation. Also looming large are regulator moves, debates over user safety and privacy, and controversy over moderation.

Agrawal has impressive technical chops. He earned a BS in Computer Science and Engineering from the prestigious Indian Institute of Technology (IIT), then earned a PhD in Computer Science at Stanford University in 2012. He’s worked in brief stints at Microsoft Research, AT&T Labs and Yahoo before joining Twitter in 2011, rising through the ranks over ten years. He’s led their machine learning efforts, and he’s been intimately involved in a research project called “BlueSky,” which is a decentralized peer to peer social network protocol.

Agrawal has moved quickly, shaking up Twitter’s leadership team. Head of design and research Dantley Davis is stepping down — the scuttlebutt is that Dantley demonstrated an overly blunt and caustic management style that rubbed too many employees the wrong way. Head of engineering Michael Montano is also departing by year’s end. Agrawal’s lines of authority are now more streamlined; he has expressed a desire to impose more “operational rigor.”

“We want to be able to move quick and make decisions, and [Agrawal] is leading the way with that,” said Laura Yagerman, Twitter’s new head of corporate communications. Agrawal’s swift change in key leadership positions suggests that Dorsey didn’t leave entirely of his own volition.

While Dr. Agrawal brings deep technical experience to the role of CEO, most outside observers are focused intently on his viewpoints regarding free speech and censorship.

Every day, voters, world leaders, journalists and health officials turn to Twitter to exchange ideas. As I write this today, the public square is pondering the dangers (or potentially nascent optimistic signs) of a new COVID variant. Foreign policy twitter is abuzz about troops massing on the Ukraine’s border and China’s activities in both Hong Kong the China Sea. Law Enforcement Twitter is asking the public for crowdsourced detective work on the latest tragic homicides.

What they all have in common is this: these stories often come to the world’s attention via Twitter. Twitter decides which types of speech should be off-limits on its platform. They say who gets booted, and what gets boosted. In other words, they have a big role in defining the collective Overton Window of online conversation. Ultimately, Twitter’s moderation policies, algorithms and (for now at least) human editorial team decide what can and cannot be said, what gets amplified, and what gets shut down.

Further, our world increasingly conflates the concepts of internet “consensus” and truth. So how do we go about deciding what information is true, and what is gaslighting? Which sources will Twitter deem “credible” and which untrustworthy? What labels will get slapped on your tweets?

The CEO of Twitter has an enormously powerful role in determining what does and doesn’t come to the public’s attention, what catches fire and what does not, and who is anointed with credibility. Agrawal knows this intimately; it’s been a big part of his work for the past several years. Twitter’s servers process nearly one billion tweets every day. And usage has swelled to nearly 220 million daily active users, with few signs of slowing:

More important, perhaps, is the highly influential nature of these users. Seth Godin called such influencers “sneezers of the Idea Virus.” Watch any cable TV news channel for more than fifteen minutes, and you’re likely to encounter someone talking about what someone has tweeted. Indeed a very high number of politicians, journalists, celebrities, government and policy officials use Twitter regularly, either to spread, consume or evaluate information. Twitter’s moderation policies can act quickly to fan an ember, or snuff it out.

During Dorsey’s tenure, Twitter came under withering fire for too-hastily suppressing and blocking views. It’s also come under fire for the opposite reason — not being fast enough to block and remove misinformation (for instance “Gamergate,” and later “QAnon” and communication surrounding January 6th.)

Most recently, concern over Twitter’s moderation policies and blocking/amplification/suppression been fiercest from civil libertarians, the right, and center-right. Among the examples:

  • In October 2020, just weeks before the presidential election, Twitter blocked the New York Post for weeks for its explosive scoop on Hunter Biden’s laptop. Twitter first said the ban was because the materials were hacked, though to this day, there is no definitive proof they were obtained that way. Subsequent reporting by Politico this year independently confirmed the authenticity of several of those emails. The New York Post was prevented from participating on Twitter for weeks leading up to the 2016 election. Dorsey later apologized for this blocking, calling it a “total mistake,” though he wouldn’t say who made it.
  • Twitter locked the account of the Press Secretary of the United States for retweeting that Biden laptop story.
  • In October 2020, Twitter suspended former Border Patrol Commissioner Mark Morgan for tweeting favorably about the border wall.
  • Twitter temporarily banned and then permanently suspended Donald Trump, a sitting president of the United States, citing his repeated violations of terms of service, specifically its Glorification of Violence policy. Yet Twitter does not ban organizations like the Taliban, nor does it suspend world leaders who threaten the nation of Israel’s existence; they generally only remove individual tweets.

Even before several of these incidents above, in a 2018 interview at NYU, Dorsey admitted that Twitter’s conservative employees often don’t feel comfortable expressing their opinions. And he conceded both that Twitter is often gamed by bad-faith actors, and that he’s not sure that Twitter will ever build a perfect antidote to that gamification. In 2020, a massive hack exposed the fact that Twitter has administrative banning and suppression tools, which among other things allow their employees to prevent certain topics from trending, and which also likely block users and/or specific tweets from showing up in “trending” sections and/or searches.

As Twitter’s influence rose, these decisions caused consternation among some lawmakers, Dorsey was pressed to sit before multiple Congressional hearings, in which he’s asked about these instances and more:https://www.youtube.com/embed/dCb9ABk-BVk?version=3&rel=1&showsearch=0&showinfo=1&iv_load_policy=1&fs=1&hl=en-US&autohide=2&wmode=transparent

One big issue is “bots,” (short for robots), which are automated programs which use Twitter’s platform and act as users. They amplify certain memes by posting content to Twitter, liking and retweeting certain content, replying affirmatively to things they are programmed to agree with, or negatively to things they are not, etc. They are a great example of how Twitter, in its “wild west” initial era, often let its platform be manipulated.

Twitter’s initial response was slow; one has to remember that bots help amplify usage numbers which in turn might help create a feeling of traction (and or ad-ready eyeballs.) But bots are often designed with malicious intent to skew the public’s perception of what’s catching fire, or to add credibility to false stories. Since 2016, Twitter has gotten more aggressive about cleaning out bots, and in 2018 greatly restricted use of their application programming interface (API.) Earlier this year, after years of hedging, Twitter finally decided to take aggressive action to de-platform the conspiracy fringe group QAnon, suspending 70,000 accounts related to that conspiracy movement, after the January 6th riot at the United States Capitol. Dorsey regretted that this ban was done “too late.”

The justification for these interventions often centers around harm. Or perhaps more accurately, it centers around what Twitter’s human and algorithmic decisionmakers judge in the snapshot moment to be “harmful.”

What’s Agrawal’s attitude about free speech? While some civil libertarians and commentators on the political right initially cheered Dorsey’s departure, that enthusiasm quickly cooled. That’s because Agrawal has in the past signaled very clearly that he believes Twitter’s censorship policy should not be about free speech, but about reducing harm and even improving peace. You can get an idea for Agrawal’s philosophy in his extended remarks with MIT Technology Review in November 2018:

“[Twitter’s] role is not to be bound by the First Amendment, but our role is to serve a healthy public conversation and our moves are reflective of things that we believe lead to a healthier public conversation. The kinds of things that we do about this is, focus less on thinking about free speech, but thinking about how the times have changed. One of the changes today that we see is speech is easy on the internet. Most people can speak. Where our role is particularly emphasized is who can be heard. The scarce commodity today is attention. There’s a lot of content out there. A lot of tweets out there, not all of it gets attention, some subset of it gets attention. And so increasingly our role is moving towards how we recommend content and that sort of, is, is, a struggle that we’re working through in terms of how we make sure these recommendation systems that we’re building, how we direct people’s attention is leading to a healthy public conversation that is most participatory.” (Emphasis added.)

In 2010, he tweeted:

The double-negative is a bit cryptic, but one interpretation of this is that book banning might be not only acceptable but relatively desirable if it increases societal peace. This sentiment is most definitely not aligned with those who believe, as I do generally, that the best antidote to speech with which you disagree is more and better speech.

As one wag put it, “I’ll be happy to ban all forms of hate speech, as long as you let me define what it is. Deal?”

More of Agrawal’s outlook can be discerned from his November 2018 interview with MIT Technology Review. For some time from 2015 to about 2018, he and the rest of the technical team at Twittter put great effort into determining whether the health of any given public conversation can be scored algorithmically. But thus far, that effort appears to have yielded disappointment. Yet Agrawal seems undaunted in this quest.

Agrawal’s Holy Grail of algorithmic scoring of the “health” or potential “harm” of a public conversation isn’t yet fully possible. Thus, they employ humans to curate discussion, and block, ban, suppress and promote (through sorting) certain expressions over others. Now, given that human editors are expensive, Twitter focuses them on a few subjects. Agrawal specifically names pandemic response and election integrity as two areas which he deems most appropriate for such intervention. Yet let’s keep in mind he also clearly believes that automated algorithmic “scoring” of healthy conversation is both possible and desirable.

Our approach to it isn’t to try to identify or flag all potential misinformation. But our approach is rooted in trying to avoid specific harm that misleading information can cause.

Dr. Parag Agrawal, Twitter’s new CEO, MIT Technology Review November 2018

While controlling discussion to promote peace might seem to be an unalloyed good, it’s not at all clear that a harm-reducing, peace-promoting Internet public square is also necessarily a truth-promoting one. For one thing, truth doesn’t care about its impact. And it isn’t always revealed right away. Our understanding and interpretation of facts change over time. It seems increasingly often that things which we once “knew” to be certain are suddenly revealed to be quite different. Would such an algorithm optimize for the wrong things, leaving us less informed in the process? These and other conundra confront Twitter’s new CEO, who took office last week.

In a way, Agrawal’s appointment as Twitter CEO can be seen as an important waypoint in the Internet’s transformation from techno-libertarianism to a much more progressive worldview with a willingness to use a heavier hand. Anti-establishment civil libertarians used to typify many Internet and tech leaders’ outlook. Yet quite steadily over the past decade, a progressive worldview has grown dominant. While one side values free speech and civil liberties as paramount values, the other believes societal peace, equity, and least “harm” trump other goals. And for some, if free speech needs to be sacrificed to achieve it, so be it. Throughout his tenure, Dorsey himself has shown elements of each philosophy.

Agrawal may be a technocrat progressive. In 2017, he donated to the ACLU so it could sue President Trump in 2017. He has also likened religion to a pyramid scheme:

Yet Agrawal it would be highly inaccurate to characterize Agrawal as a censorship extremist. He advocates more open access to Twitter’s archives through Application Programming Interfaces (API’s), and more third-party analysis of what’s discussed on the platform.

One hopeful sign is that Agrawal has already experienced his own “my old tweets have been taken greatly out of context” moment immediately after being made Twitter’s new CEO. Critics on the right seized on this October 26th 2010 tweet of his, suggesting it somehow demonstrates that he’s equating white people with racists:

But as he quickly explained, “I was quoting Asif Mandvi from The Daily Show,” noting that his intent was precisely the opposite. Agrawal was joking about the harm of stereotypes. He was of course not making a factual statement, but using sarcasm to make a larger point.

As someone who tends to side more with civil libertarians with respect to free speech, I hope he remembers that it was his ability to respond and clarify and respond with more speech which helped convey his true feelings and much more clearly convey the truth that went beyond the first 280 characters. Wasn’t it best for him that he could quickly dispel a controversy and continue to engage, and wasn’t banned due to some lower-level employee determining his first tweet caused harm under at least one subjective interpretation?

Perhaps the central conundrum is that content moderation is impossible to get perfectly “fair” or least-harm-imposing. No algorithm or human will be able to make the correct decision at every moment. Thus, guidelines need to exist which define an optimal content moderation policy. For that, you need the platform’s leader to define what should be optimized via such a policy. Truth? Liberty? Fairness? Viewpoint Diversity? Peace?

Back to the thought-exercise which started this piece. Would everyone score the “harm” of a given conversation the same way, or the credibility or intent of the speakers? Obviously, we wouldn’t. Algorithms — especially machine learning algorithms — are tremendously powerful, but they also can give an illusion of objective authority. In reality, they’re only as good as their data training and evaluation sets, and those evaluation sets have explicit goals. Each desirable metric chosen to optimize (Peace, Truth, Viewpoint Diversity, etc.) would yield very different algorithms. And the result would be very different content moderation, amplification and suppression policies.

Agrawal’s view will likely evolve, but for the moment, he appears to prioritize what he considers “healthy conversation” and avoidance of “harm.” It is how he actually defines health and harm which will be very important indeed. For it will determine what we know to be true, and from whom we hear.

Jack Dorsey’s resignation letter concludes with this statement: “My one wish is for Twitter Inc. to be the most transparent company in the world.”

That would be most welcome. But they have a very long way to go indeed. Godspeed, Dr. Agrawal.

Which Seattle candidate most agrees with you? Take the all-new Alignvote quiz.

Which candidate most agrees with you in the Seattle 2021 general election? Take the all-new Alignvote quiz to find out.

I’ve just released a brand new version of Alignvote, completely rewritten and updated for the 2021 November Seattle General Election:

alignvote.com

There are four voter-candidate matchmakers for key Seattle races: Mayor, City Council 8, City Council 9, and City Attorney. And, assuming forums and questionnaires continue to develop in the race for King County Executive, there will likely be one for that race soon as well.

Are you just getting up to speed on some of the big issues in these races, and curious where the candidates stand on some key questions? Drop by Alignvote, and take the quiz. Then share it with your friends.

https://alignvote.com

Questions are sourced from candidate forums, direct questions placed to the candidates, and their positions as outlined on their campaign websites, on-the-record statements and elsewhere.

Know any undecideds?

In a poll released today, some 65% of voters were undecided in the City Attorney race, and 27% were undecided for Mayor:

Here’s a sample question, for the Mayor’s race. This question was posed at a recent forum, and answered by both general election candidates. Which one do you agree with? You can move the “Importance” slider between seven values, from “irrelevant” to “essential.”
Example question from the Mayor’s race, which was asked in a recent forum

All candidates were notified last Thursday and invited to provide additional elaboration if they’d like.

New Features

  • The voter-candidate matchmaker is now embeddable on any website, and I’d be happy to have this embedded on your site or blog. If you have a news site or blog covering city politics or the Seattle election, grab the snippet of code to embed the Alignvote Quiz on your website.
  • There’s a new “Evidence” section at the end of the stack-rankings. That section will include links to relevant news stories, tweets, commentary and more which are directly related to the candidate’s own views on the question at hand. This will likely be growing between now and November. If you have relevant stories or links to include, jot a tweet or Direct Message to @alignvote on Twitter.
  • Easier administration. A great deal of the effort was put into easier administration on the back-end. I have rewritten the code entirely from Angular/Material to React/NextJS.

On Controversy and Bias

In the 2019 cycle, Alignvote delivered over 20,000 voter-candidate rankings, and certainly generated some controversy.

Alignvote measures the level of match between you and the eight candidates in the four races above. You and the candidate are answering the same question. Alignvote simply scores the distance, weighted by the importance that you assign on these questions. The candidate with the least overall distance, weighted by importance to you on each issue, comes out on top.

How are questions sourced? Of course, these aren’t the only questions which should matter to a voter, but they are ones where the candidates often have differing viewpoints and ones in which they have made their stances clear.

As for me, the guy behind this project, like all voters and writers, I have political views. I have expressed them here on my blog, and I will continue to do so. I am not unbiased. My own views may not match your own. This is true of any blogger, tweeter, activist-blogger, TV personality or mainstream journalist covering politics.

For what it’s worth, on political quizzes and by Gallup polling, I generally score as a centrist, and I have supported Democratic, Independent and Republican candidates with varying ideologies over the years. And, since it seems a highly relevant indicator to Seattleites, I’ve never voted for Trump.

But I also realize that the term “centrist” is a subjective label. I value great public schools, affordable and convenient transit options, help for those who need it, good and accountable government, green parks, limits on services for those who refuse to partner, transparent metrics for the public which funds services, more affordable housing inventory, better solutions for those experiencing mental health or substance use crises in their lives and improved health and environmental outcomes for all.

I think that’s in part because (a) some political writers/tweeters with relatively large followings really, really don’t like when policy tradeoff questions are framed in any other terms other than the favorable ones they prefer, and (b) many candidates like to “tack toward the center” in the general and therefore do not want to be pinned down in multiple-choice options. They want the freedom to be all things to all voters for a general election.

But leadership, including civic leadership, is often about tradeoffs. If there were solutions to long-term controversies that had easy “no cost” answers, they’d have been done by now.

I think voters deserve more clarity.

To be sure, the selection of any set of questions in any poll or survey or candidate forum can absolutely result in bias.

Controversial issues can be framed in a number of ways.

Alignvote simply shows the level of match between you and the candidates on the questions. There are many opportunities to hear open-ended answers to questions (interviews, forums, meet-and-greets and more.) By design, to help voters quickly identify closest-match-to-them, Alignvote relies upon closed-ended questions, where both you and the candidate must commit to one of the answers.

And Alignvote offers candidates an option to elaborate on why they chose the option they did. Campaigns were all emailed these questions for elaboration on Thursday, September 23rd, and I would be happy to put their elaborations in for voters to hear. (They should allow up to 72 hours for their elaborations to go live.)

Find it useful? Share it, and follow on social media.

I’m very gratified to hear directly from many of you that it’s been very helpful. It’s an entirely free civic project, and is not funded by any campaign or political organization — its very modest costs are solely funded by me.

If you like it, please share it with fellow Seattleites. Options include email, Twitter, Facebook, Nextdoor, Reddit and word of mouth. Or just jot me a follow on social media. You’ll find me at @stevemur and alignvote at @alignvote. And please vote in November!

photo credit: Nitish Meena

Generating Sitemap from Scully Routes

Super techie post, but if you’re using Angular, there’s finally a way to deliver fast-loading pages with great SEO. Use the great prerendering framework Scully in your build/deploy pipeline.

I created HipHip.app, a website and iPhone app that lets you create surprise celebration videos quickly and easily.

The website is written in Angular.

Angular is incredibly powerful, but two drawbacks until very recently were first-time download performance and search engine optimization. These are natural consequences of so-called “Single Page Application” (SPA) frameworks like Angular. Rather than render a new page on every request, SPA applications are written in JavaScript, downloaded in bigger chunks of code, and build out the user-interface for each route on the client-side.

To help return some of these benefits back to Angular developers, the Angular team has released Angular Universal, a “server side rendering” technology. But in my own experience, at least through Angular v.11, I’ve found that it’s not quite ready for prime-time, with a whole lot of workarounds needed for most common Angular components and libraries. Among other things, when you use Angular Universal, any use of the “window” object in your code or third-party components has to be rewritten or worked-around.

Enter Scully, a Fast Static Site Builder

But recently, a team of developers took a different approach and created Scully, which allows you to easily pre-render the majority of the routes on your website. That is, with Scully, at build/test/deploy time, a Puppeteer bot is spun up in an Express application, and crawls your ready-for-production site in a variety of configurable, smart ways. You end up with a bunch of static “index.html” pages that are all ready to go, along with a bootstrapper that loads in your Angular application after page download. The result is that a user and a search engine spider each see a very fast initial page with most of the necessary content, while the rest of the app is downloaded in the background.

It brings the JAMStack philosophy to Angular.

So, I’ve added Scully to HipHip‘s build & deployment stack, resulting in much faster load times for end-users. I’ve been happy with it.

And over time, it should also lead to better search engine optimization for the site.

Sitemap.xml

If you’ve read this far, you probably already know what a sitemap is, but it’s a file which tells the search engines which links you’d like it to crawl. It’s a helpful asset to have in a Search Engine Optimization strategy, because it can point the search engines to the routes you most care about, and away from the ones you do not.

Rendering a Sitemap from Scully Output

Since Scully generates a scully-routes.json file (you’ll see it in your “assets” subfolder by default after Scully completes its work), the natural next step in the build process for better SEO is to simply walk through that JSON file and create a sitemap.xml.

Here’s some quick Python code to do just that, which I’ve added as a step in my validator/build routine. It builds valid sitemap.xml files, and you can use it as a starting point for other work if you’d like.

Sharing here in case other Angular developers find this useful.

I place this file below, generate-sitemap.py in my “src” subfolder, then I run this script after Scully completes its work in the build process. Error and exception handling left up to the reader. Obviously, change “https://hiphip.app” in the code below to your root URL.

# generate-sitemap.py
import json

def generate_sitemap():
    # the list of specific routes to ignore
    ignore_list=["/onem","/for"]
 
    # the list of route prefixes to ignore 
    ignore_startswith=["/utils", "/dash", "/login", "/logout", "/create", "/account", "/error", "/settings"]

    print("Generating sitemap.xml from ./assets/scully-routes.json")

    with open('./assets/scully-routes.json') as f:
        data = json.load(f)
        f = open("sitemap.xml", "w")
        f.write('<?xml version="1.0" encoding="UTF-8"?>\n')
        f.write('<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">\n')

        for route in data:
            path = route["route"]
            if path not in ignore_list:
                if (not path.startswith(tuple(ignore_startswith))):
                    print(route["route"])
                    f.write('<url>\n')
                    f.write('<loc>https://hiphip.app'+path+'</loc>\n') ## << CHANGE
                    f.write('<changefreq>daily</changefreq>\n')
                    f.write('</url>\n')
                else:
                    print("SKIPPING: "+path)
            else:
                print("SKIPPING: "+path)
        f.write('</urlset>')
        f.close()

        print("SITEMAP GENERATED SUCCESSFULLY. Saved to src/sitemap.xml")