Produced by:
| Follow Us  

The Fandom of the Opera: How the Audience for a Centuries-Old Art Form Helped Create Modern Media Technology

November 20th, 2014 | No Comments | Posted in Download, Schubin Cafe

Presented as part of National Opera Week at Stevens Institute of Technology, College of Arts & Letters, Hoboken, NJ on October 30, 2014.

Believe it or not, electronic home entertainment was invented for opera audiences. So were consumer headphones, movies, newscasts, and pay-cable. The first sportscasts were in opera houses. The first wireless broadcast? The first commercial digital recording? The first live subtitles? All opera.

The idea of transmitting opera motion pictures and sounds live to theaters worldwide appeared in print in 1877, to homes in 1882. Without opera, there might not be communications satellites. And, according to pioneering radiologist Percy Brown, “No opera, no X-rays!”

The first opera recordings were made 17 years before Edison’s first phonograph, and 76 years before that an automaton played opera music for Marie Antoinette. In the 21st century, labs around the world are working on ultra-high-speed communications systems for opera and have discussed neutrino communications and quantum entanglement.

Galileo, Kepler, Lavoisier, Matisse – all had opera-technology connections. Stereo sound? The laryngoscope? Broadcast rights? All for opera. Really. Watch and be amazed.

Direct Link (151 MB / 1:06:16 TRT): The Fandom of the Opera-Stevens

Embedded:

Tags: , , , , , , , , , , , , , , ,

The Habit and “The Hobbit”

February 5th, 2013 | 2 Comments | Posted in Schubin Cafe

Here are a couple of questions to get you started: What is the image at left? And what is the sound of a telephone call?

I’ll offer some more information about the first one. It’s an “intertitle,” the sort of thing inserted into silent movies to help advance their plot.

This one happens to be from a pretty famous movie. Got any idea yet of which one? You’re likely to be familiar with it even if you never saw it. But the answer might be surprising.

Now, how about that telephone call? Bell Labs researcher and audio pioneer Harvey Fletcher wanted its sound to be unidentifiable, i.e., just as good as being there. Today, if you use a certain type of mobile phone, you might be able to identify certain negative artifacts, but, in general, with contemporary technology, Fletcher’s dream has been achieved: a telephone call sounds pretty much like any other reproduction of an electronic audio signal. And that’s a problem.

When the kidnapper calls to demand ransom in a movie or TV thriller, the camera might offer a close-up of the person taking the call, but the kidnapper’s voice shouldn’t sound like it’s coming from the same room. So a voice filter is used, typically restricting the bandwidth of the sound to a range from roughly 300 Hz to 3 kHz as shown at the right in the Cisco white paper “Wideband Audio and IP Telephony” <http://bit.ly/116U1Mn>.

If you’re familiar with sampling theory, you know that, to avoid spurious frequencies known as aliases, sampling must be done at a rate higher than twice the desired highest frequency, and the signal must be filtered to prevent anything higher than that highest desired frequency from entering the sampler. Filters are imperfect, so, if a telephone company wanted to sample 8,000 times per second, it would not be totally unreasonable for the system to pass little more than 3 kHz.

Digital transmission systems don’t care about filtering low frequencies, however, so why the 300 Hz low-frequency cutoff? It dates back to analog transmission systems, wherein different frequencies would be attenuated by different amounts, and an equalizer would restore them. The attenuation might be described as a certain number of decibels per decade. A decade, in this case, is a tenfold increase in frequency, as from 300 Hz to 3 kHz. Going down to 30 Hz from 300 would add another decade, doubling the equalization needed.

Today, in the era of digital transmission, going down to 30 or even 20 Hz would not be a problem, which is why people describe today’s real-world telephone calls in such terms as “sounding like you’re next to me.” But the sound of a telephone-call voice in a movie or on TV still harks back to an earlier era (just as a print ad might tell its viewer to “dial” a certain phone number in an era when it’s hard to find a dial-equipped phone outside a museum).

It’s not easy on a visual web page to provide examples of telephone call sounds, especially since I have no idea what your listening equipment is like. But here is another common example of a motion-image-media indicator that strays from reality: the binoculars mask.

If you use binoculars, you probably know you’re supposed to adjust their eye separation so that there’s one circular image, not the lazy eight shown at left. But, if there’s no binoculars mask effect, how is a viewer supposed to know that the scene is seen through binoculars?

Now, perhaps, we can consider frame rate. Though he wanted telephone calls to sound just like being there in person, Fletcher did the research that identified the 300 Hz-to-3 kHz range for speech intelligibility and identification. Are there physical parameters affecting choice of frame rate? There are more than one.

One is typically called the fusion frequency, the frequency at which a sequence of individual pictures appears to be a motion picture. You can find your own fusion frequency with a common flip book; an 1886 version called a Kineograph is shown at right.

Flip through the pages slowly, and they are individual still pictures. Flip through them quickly, and they are a single motion picture.

Unfortunately, there is no single fusion frequency. It varies from person to person and with illumination, color, angle, and type of presentation.

The type of presentation becomes significant in another frame-rate variable: what’s commonly called the flicker frequency, the rate at which sources of illumination appear to be steady, rather than flickering.

Some of the earliest motion-picture systems took advantage of a fusion frequency generally lower than the flicker frequency. They presented motion pictures, but they flickered, thus an early nickname for movies: flickers or flicks.

One “solution” to the flicker problem was the use of a two-bladed shutter in the projector. A film image would be moved into place, the shutter would turn, the image would appear on screen, the shutter would turn again, the image would disappear, it would turn again, it would reappear, and it would turn again while a new image moved into place. The result was an illumination-repetition rate twice that of the frame rate, perhaps enough to achieve the flicker frequency, depending, again, on a number of viewing factors.

While the two-bladed (or, in some cases, three-bladed) shutter helped ameliorate flicker, it introduced a new artifact into motion presentation. A moving object would appear to move from one frame to another but to stall in mid-motion from one shutter opening to another. Clearly, that was a step away from reality, but, like a limited-bandwidth telephone call and a binoculars mask, it tended to indicate the look of a movie.

What rate is required? When Thomas Edison initially chose 46 frames per second (fps) for his Kinetoscope, he said it was because his research had showed that “the average human retina was capable of taking 45 or 46 photographs in a second and communicating them to the brain.” But the publication Electricity, in its June 6, 1891 issue, contrasted the Kinetoscope’s supposed 46 fps with Wordsworth Donisthorpe’s Kinesigraph’s six-to-eight: “Now, considering that the retina can retain an impression for 1/7 of a second, 8 photographs per second are sufficient for the purpose of reproduction and the remaining 38 are mere waste.”

Is there a “correct” frame rate? This week’s Super Bowl coverage made use of For-A’s FT-One cameras (above), which can shoot 4K images at up to 900 fps. But that was for replay analysis.

At the International Broadcasting Convention (IBC) in Amsterdam in 2008, the British Broadcasting Corporation (BBC) provided a demonstration in the European Broadcasting Union (EBU) “village” that showed how frame rates as high as 300 fps could be beneficial for real-time viewing. At left is a simulation of 50-fps (top) vs. 100-fps (bottom), showing a huge difference in dynamic resolution (detail in moving images).

Note that the stationary tracks and ties are equally sharp in both images. The moving train, however, is not. Other parts of the demonstration showed that high-definition resolution might appear no better than standard-definition for moving objects at common TV frame rates.

A clear case seemed to be made for frame rates higher than those normally used in television. Again, that was in 2008. In 2001, however, Kodak, Laser-Pacific, and Sony each won an engineering Emmy award for making possible 24-fps video–video at a lower frame rate than that normally used.

As the BBC/EBU demo at IBC clearly showed, 24-fps video has worse dynamic resolution than even normal TV frame rates, let alone higher ones. Yet 24-fps video has also been wildly successful. It provides a particular look, just as a binoculars mask does. In this case, the look contributes to a sensation that the sequence was shot on film. But why did movies end up at 24-fps? It’s not Edison’s 46 nor Donisthorpe’s 8.

The figure is based on research but not research into any form of visual perception. Go back to the intertitle at the top of this column. Have you guessed the movie yet? It’s The Jazz Singer, the one that ushered in the age of sound movies, even though, as the intertitle shows, it, itself, was not an all-singing, all-talking movie.

Some say 24-fps was chosen as the minimum frame rate that would provide sufficient sound quality. But The Jazz Singer, like many other sound movies, used a sound-reproduction system, Vitaphone, unrelated to the film rate: phonograph disks. In the 1926 demo photo above, engineer Edward B. Craft holds one of the 16-inch-diameter disks. Their size and rotational speed (33-1/3 rpm, the first time that speed had been used) were carefully chosen for sound quality and capacity, but they could have been synchronized to a projector running at any particular speed.

That was the key. Sound movies did not require 24-fps, but they required a single, standardized speed. The choice of that speed fell to Stanley Watkins, an employee of Western Electric, which developed the Vitaphone process. Watkins diligently undertook research. According to Scott Eyman’s book The Speed of Sound (Simon & Schuster 1997), he explained the process in 1961:

“What happened was that we got together with Warners’ chief projectionist and asked him how fast they ran the film in theaters. He told us it went at 80 to 90 feet per minute in the best first-run houses and in the small ones anything from 100 feet up, according to how many shows they wanted to get in during the day. After a little thought, we settled on 90 feet a minute [24-fps for 35 mm film] as a reasonable compromise.”

That’s it. That’s where 24-fps came from: no visual or acoustic testing, no scientific calculation, just a conversation between one projectionist, one engineer, and, according to Watkins’s daughter Barbara Witemeyer in a 2000 paper (“The Sound of Silents”), Sam Warner (of Warner Bros.) and Walter Rich, president of Vitaphone. After Vitaphone and Warner Bros., Fox adopted the speed, and soon it was ubiquitous.

Fluke or not, 24 fps came to symbolize the look of film, which is why 24-fps video is so popular. We have a habit of associating that rate with movies.

The Hobbit broke that habit. It is available in a 48-fps, so-called “HFR” (high-frame-rate) version. And its look has received some unusual reviews.

Some have complained of nausea. It’s conceivable that there is some artifact of the way The Hobbit has been projected in some theaters (in stereoscopic 3D) that triggers a queasiness response in some viewers, but it seems (to me) more likely that those viewers might be reacting to some overhead, spinning shots in the same way that viewers have reacted to roller-coaster shots in slower-frame-rate movies.

Others have complained of a news-like or video-like look that made it more difficult for them to suspend disbelief and get into the story. That’s certainly possible. If 24-fps contributes to the look of what we are in the habit of thinking of as a movie, then 48-fps is different.

Of course, we no longer watch flickering silent black-&-white movies with intertitles, projected at a rate faster than they were shot, either. Times change.

 

Tags: , , , , , , , , , , , , , , , , , , , , ,

What It Was Was Television

February 29th, 2012 | No Comments | Posted in Schubin Cafe

 

Did Thomas Edison predict television? According to some histories, the answer is yes, and the evidence is the image below, published on December 9, 1878 and captioned “Edison’s Telephonoscope (Transmits Light As Well As Sound).”

The award-winning historian Erik Barnouw referred to it as a “startling prediction,” though he correctly attributed it to writer and artist George du Maurier, not to Edison himself. Perhaps the image below will help you decide whether du Maurier was trying to predict anything. Published in the same periodical as the image above, Punch’s Almanack for 1879, it depicts the use of another supposed Edison invention, Anti-Gravitation Under-Clothing.

Punch was a humor publication, but neither that fact nor the anti-gravitation under-clothing has prevented some from insisting, based on the top image, that Edison predicted television. In fact, Edison did invent a telephonoscope, and it was well-publicized in the middle of 1878!

Unfortunately, it had nothing whatsoever to do with any form of television or image transmission. At left above is the word “telephonoscope” written by Edison, himself, for a patent caveat filed on May 10, 1878. At right is Edison’s drawing of the device, essentially a large, dual-tripod-mounted, binaural, ear trumpet for amplifying sounds.

Although the word telephonoscope was later used by another writer and cartoonist, Albert Robida, to describe television (see above), it seems pretty clear that Edison had nothing to do with it. In fact, it seems Edison didn’t like television under any name.

Last year, in a well-referenced book called The Quotable Edison, published by the University Press of Florida, editor Michele Wehrwein Albion wrote of the inventor, “Though he was in his eighties when television was pioneered, he felt threatened by the new technology, decades before it would be in everyday use.” The selected Edison quotations about television corroborate the statement.

Here are two of them: “[Television is] possible, but of very little general value. It’s a stunt” (The New York Times, February 12, 1927). “Locomotives are pretty well developed, but you wouldn’t want to buy one and have it in your house, would you? Television is like that” (The New York Times, December 24, 1930).

So Edison didn’t invent, like, or predict television? Well, actually, no.

Another quote from the book is this one: “[A] man can sit in his own parlor and see depicted upon a curtain the forms of players in opera upon a distant stage, and to hear the voices of the singers.” It is taken from the May 20, 1891 issue of the periodical The Electrical Engineer. It’s listed as one of the book’s motion-picture quotes, and there’s no question that publication would have considered the characterization correct at the time.

The periodical’s article is titled “The Edison Photophonokinetograph,” and it ends with Edison clearly describing a sound movie apparatus. But there’s a problem. The first part of the article is based on “a dispatch from Chicago May 12;” the last part, the part that’s unquestionably about a motion-picture apparatus, is from Brooklyn.

There are other peculiarities. Why use the word “distant” if referring to a movie system? And why (just before the Brooklyn section) reference stock and race tickers, which are live, not recorded?

Fortunately, The Electrical Engineer is not the only source for that story of the Edison prediction. It appeared not only in all of the Chicago newspapers of the time but also in newspapers and other publications around the world.

The earliest publication of that prediction (shown at left, from a clipping at Thomas Edison National Historic Park) — the same day Edison made his statement — is from The Chicago Evening Post, Tuesday, May 12, 1891. The article’s headline was “Edison’s in Chicago.”

The world’s most famous inventor had come to town to discuss electric power at the upcoming World’s Columbian Exposition, a World’s Fair intended to celebrate the 400th anniversary of Christopher Columbus’s first voyage of discovery. It was a big deal (see the Fair’s Electricity Building below), so he was swarmed with press.

As it turned out, Westinghouse underbid Edison and ended up providing the electric power and lighting that led (at least in part) to the fair’s being called “The White City.” But no one knew about that on May 12, 1891, so Edison was asked about what he’d show at the fair. He said he had so much to show that he wanted more than an acre of exhibit space. And what specific “novelty” might he show?

The answer to that question was reported differently in different publications. According to one, The Chicago Tribune, on May 13, it was “the kinetograph,” “a combination of photography and phonography.” That’s a good description of a sound movie, and, in an age when both phonographs and movie systems were hand cranked, there was no need for electricity. But, as in the report in The Electrical Engineer, most publications called it “a happy combination of photography and electricity” (emphasis added).

At right is a portion of that first published report in The Chicago Evening Post. It indicates that the viewer will require “having electrical connection with the theatre” in order to see and hear what’s going on there. Movies do not require such a connection.

The Wellington, New Zealand Evening Post noted the new invention’s similarity to the telephone (but including a visual element) and quoted Edison as saying that, if opera singer Adelina Patti “be singing somewhere, this invention will put her full length picture upon canvas so perfectly as to enable one to distinguish every feature and expression of her face, see all her actions, and listen to the entrancing melody of her peerless voice.” It, too, described an electrical connection between viewer and source.

The unavoidable conclusion from the vast majority of the reports (The Chicago Tribune being the exception) — the combination of photography with electricity, the distant stage, the electrical connection between viewer and source, the likening of the invention to a visual form of the telephone and stock and race tickers — is that Edison predicted that he would show some form of television at the 1893 Chicago World’s Fair.

Did he mean it? That’s a different question.

Even according to The Chicago Tribune, Edison made the grant of sufficient space at the fair a condition of his showing the new invention. So maybe it was a bargaining ploy. Or, maybe, like du Maurier’s report of Edison’s Telephonoscope more than a dozen years earlier, it was all a joke.

According to The Chicago Evening Post, Edison said “this invention does not have any particular commercial value.” When asked what it was, he began, “We-ell” and “released a diminutive laugh.” Edison then described what was still wanting in the invention.

“‘But you will be able to supply that want!’ some one anxiously inquired.

“Mr. Edison smiled by way of reply and in a way that all doubts were swept away.”

Tags: , , , , , , , , , , , , , ,

How Different Is 3D?

November 11th, 2010 | No Comments | Posted in 3D Courses, Schubin Cafe

cameroSmWhen you watch a televised advertisement for an automobile, do you fear there’s a moving car in the room with you? I didn’t think so. But more on that later.

This post is about human perception of 3D imagery. It’s also about how we see moving images in general and about color, sound, carsickness, and the idea of smashing open a TV set with a hammer to allow the tiny people inside to be seen more clearly.

ABZ smallThat last suggestion probably first appeared in 1961 in an age-inappropriate alphabet tome called Uncle Shelby’s ABZ Book, written by Shel Silverstein. In it, T was for TV. The book indicated that small performing elves lived inside the television set and an adventurous child reader using a hammer to break open the tube “will see the funny little elves.”

That same year, Colin M. Turnbull of the American Museum of Natural History published “Some observations regarding the experiences and behavior of the BaMbuti Pygmies” in the American Journal of Psychology. One of the observations seems related to those little elves in the television set.

African_Buffalo_Drawing_historic small“As we turned to get back in the car, Kenge looked over the plains and down to where a herd of about a hundred buffalo were grazing some miles away. He asked me what kind of insects they were, and I told him they were buffalo, twice as big as the forest buffalo known to him. He laughed loudly and told me not to tell such stupid stories and asked me again what kind of insects they were. He then talked to himself, for want of more intelligent company, and tried to liken the buffalo to the various beetles and ants with which he was familiar.” http://www.wadsworth.com/psychology_d/templates/student_resources/0155060678_rathus/ps/ps06.html

Those of us who grew up with television and open spaces might find both stories equally ludicrous. We know the people we see on a TV screen are full size (and don’t live inside the television set) and so are distant animals. But why do we know that?

Based on the angles their images form on our retinas, we should think the people we see on a small TV screen are tiny. We don’t only because we’ve learned what TV is. Kenge, a life-long forest dweller, had never been exposed to distant vision, so he’d never learned how small things might look when viewed from far away.

Banks V-AWhat does that have to do with 3D? Take a look at the diagram at the left. It was created by Professor Martin Banks of the Visual Space Perception Laboratory at the University of California – Berkeley. The vertical axis represents viewing distance from a movie or TV screen, the “accommodation” or eye’s-lens focusing distance. The horizontal axis represents the depth within a stereoscopic 3D image where something appears to be, the “vergence” or “convergence” distance, the distance to which the two eyes point (“vergence” is used because eyes can both converge and diverge).

The dark-colored area represents a comfortable viewing zone — a depth range where 3D viewing should not make viewers feel sick. The lighter-colored area represents a potentially uncomfortable “fusion” zone, where viewers can combine the two eye views into a single object or character, though they might not like doing so. Outside that zone, even fusing the two images into one can be a problem.

At viewing distances of at least 3.2 meters (easily achieved in cinema auditoriums; less common in homes), the comfort zone appears to extend out to an infinite depth behind the screen, and only very close vergence depths are a problem. At shorter (home) viewing distances, even significant depth behind the screen can cause discomfort, as well as in front of it.

in-three home_theaterThere’s an easy solution to the problem, one put forth in the white paper “3D in the Home.” It was previously available on the web site of the 3D company In-Three. http://in-three.com/

In accordance with the comfort-zone plotted above, the In-Three white paper said depth could extend to an infinite distance behind the screen for movie-auditorium viewing, with restriction only for imagery extending in front of the screen. As shown in the diagram at right, however, for a home-theater viewing distance of six feet, the white paper suggested restricting depth behind the screen to just four feet and depth in front of the screen to less than two feet. That depth range, too, seems well within the vergence-accommodation comfort zone.

football_field trimmedIt might be possible to restrict shooting to that depth range in a talking-heads-style public-affairs discussion. But that’s an extremely limited range.

It’s unlikely to be sufficient even for a variety or reality show, let alone for most sports. Two football players standing side-by-side perpendicularly to the camera might exceed the range all by themselves.

Another alternative, therefore, is to shoot the natural scene depth but adjust homologous points in the two eye views so that the depth presented on a home display does not stray beyond the comfort zone. Unfortunately, the shrunken depth might cause those football players to be perceived as being tiny, like the supposed buffalo insects or mythical TV-set elves.

Banks apparatus trimmedProfessor Banks is well qualified to discuss discomfort associated with viewing stereoscopic imagery. He designed an impeccable experiment that proved that a vergence-accommodation conflict could cause discomfort (one experimental subject even aborted the sequence due to extreme queasiness). At right a subject bites a bar to ensure accurate distance measurements. But Banks was by no means the first person to note the consequences of a vergence-accommodation (V-A) conflict.

The zone of comfort is often called Percival’s zone in honor of Archibald Percival, who published “The Relation of Convergence to Accommodation and Its Practical Bearing” in Ophthalmic Review in 1892 (and even in that paper, Percival attributed ideas to prior work published by Franciscus Donders in 1864). The reason eye doctors have been concerned about V-A conflict relates, in part, to eyeglasses. If you wear them, you might have noticed a queasy feeling when you put on your first pair or when there was a substantial change in the prescription. But that feeling probably faded as you became accustomed to the V-A conflict.

1940_in_first-tv-network_mAnother group that was interested in V-A conflict was the original National Television System Committee (NTSC), which began meeting in 1940, the year this off-screen photo was taken. WRGB was named in honor of Dr. Walter Ransom Gail Baker, the engineer who became the head of the NTSC (the initials also stand for white-red-green-blue color systems).

The first NTSC came up with the standard for American black-&-white television, but they were also concerned about color. One of their concerns was that simple lenses (like those in our eyes) cannot focus red and blue in the same place at the same time. The change in focus is a change in accommodation, potentially leading to a V-A conflict. In other words, color TV, in theory, could have made people sick.

In fact, the NTSC concluded that it wouldn’t, based on such work as a paper by Technicolor research director Leonard Troland published in the 1926 American Journal of Physiological Optics specifically related to color motion pictures and the V-A conflict.  But, even if color TV would have made viewers sick in 1926, would it always have done so?

Ciotat2Consider, for example, a short movie shot by the Lumiere brothers in 1895, L’arrivée d’un train en gare de La Ciotat (The Arrival of a Train at the Station of La Ciotat). The original looked a little better than what’s shown here, but it was black-&-white and silent. And it’s clear that the train is not heading straight towards the camera.

Nevertheless, here is a report (translated from the original French) from Henri de Parville, an audience member at an early screening. “One of my neighbors was so much captivated that she sprang to her feet… and waited until the car disappeared before she sat down again.” The same reaction was not reported from screenings of other movies, such as one of workers leaving the Lumiere factory. In other words, it seems as though the crude, silent, black-&-white movie made at least one audience member react as though there were a locomotive in the screening room.

Tone TestAbout a quarter-century later, Thomas Edison conducted what he called “tone tests,” at which audience members were blindfolded or placed in a dark room and asked if they could tell the difference between a live opera singer and a mechanical phonograph recording of one. Here’s a contemporary account from the Pittsburgh Post in 1919 about a test conducted at a concert hall. “It did not seem difficult to determine in the dark when the singer sang and when she did not. The writer himself was pretty sure about it until the lights were turned on again and it was discovered that [the singer] was not on the stage at all and that the new Edison [phonograph] alone had been heard.”

It might seem ridiculous to readers today that a viewer could be scared by a silent, black-&-white movie of a train or that a listener couldn’t tell the difference between a live singer and a mechanical recording of one (in fairness, I should point out that one of the singers revealed, many years later, that she’d taught herself to sound like a phonograph recording). But that’s because we’ve learned to perceive the differences between those recordings and reality.

There are many examples of such perception education. You might have outgrown your childhood carsickness, for example, just as sailors get over seasickness.

In 3D, research into the amount of time it takes subjects to fuse stereoscopic images has found not only improvement with experience but even the ability of those who underwent the experiments to fuse stereoscopic images more rapidly when tested again after a very long period of no exposure to stereoscopic images.  3D perception, it seems, comes back, just like riding a bicycle. And some eye doctors specialize in training people with stereoscopic perception problems.http://www.vision3d.com/

There are two pages of health warnings in the manuals of Samsung 3DTVs, and at least some of them may be very well justified by such issues as the vergence-accommodation conflict.  But that doesn’t mean viewers will always have problems watching 3DTV.

Tags: , , , , , , , , , , , , , , , , , , , , , ,

The Impossible Dream: Perfect Lip Sync

March 31st, 2010 | No Comments | Posted in Schubin Cafe

There is definitely plenty that can be done to improve lip sync.  Making it perfect, however, might not be possible.

Perhaps it would be best to start with a definition.  Lip sync is the synchronization of the sounds emerging from moving lips with the images of those moving lips.  No moving images, no lip-sync issues, per se.

There are many creation myths, and one associated with moving images and sound is that The Jazz Singer (1927) was the first sound movie.  It wasn’t.

It wasn’t even the first Warner Bros. Vitaphone synchronized-sound feature movie.  And it wasn’t the first “all-talking, all-singing” sound movie, not least because it wasn’t all talking or all singing.  Here’s a typical “silent” movie “intertitle,” from one of many non-talking sections of The Jazz Singer:

Jazz Singer slide

What the first sync-sound movie actually was is not obvious.  Scientific American suggested adding sound to 3D projected images in 1877, but those were to be still pictures.  Wordsworth Donisthorpe responded in Nature a few weeks later that he could do it with moving pictures.

It’s possible (based on recollections decades later) that some experimental apparatus was built around 1888.  Edison wrote in his fourth motion-picture patent caveat that “all movements of a person photographed will be exactly coincident with any sound made by him.”

Edison Kinetophone There’s no question that Edison demonstrated sound-movie Kinetophones by 1893.  But, despite a contemporary report that the sound was in sync with the pictures, it’s possible that the sound merely started at the same time as the pictures.  And the Kinetophone was a one-viewer-at-a-time, short-duration system.

phono-cinema-theatre-exposition-de-1900There’s also no question that a form of sync-sound movies was shown at the Phono-Cinéma-Théatre at the World’s Fair in Paris in 1900.   But the system was different from what we’re accustomed to in video production today.

First, the pictures were captured.  Then, watching the images of themselves on screen, the performers lip-synched to what they had done during a phonograph sound-recording session.

In presentation, the process was reversed.  The projectionist used a telephone receiver to listen to the sound (from a phonograph in the orchestra pit) and adjusted the cranking speed of the projector to maintain lip sync (or at least to attempt to maintain something pretty close to proper lip sync).

True lip sync, with sound and picture locked, was actually patented towards the end of the 19th century, and implemented no later than the first decade of the 20th.  More of the history may be found here: http://filmsound.org/ulano/talkies.htm

From roughly the beginning of the 20th century to the introduction of digital video processing in the early 1970s, there was good lip sync.  But it wasn’t always automatic.

Movie sound was typically recorded separately from pictures.  A clapper atop the slate provided a sync point, and various mechanisms were used to make the camera and sound-recorder motors run in sync, but sound was manually synchronized to picture.  Video recorders captured both sound and picture together, but editors using early mechanical equipment had to take into consideration a considerable distance between the video and audio heads.

Then came that digital video processing.  The CVS 500 in 1973 could not only synchronize incoming feeds but also shrink them to a quarter of their size, something that seems trivial today but was near miraculous at the time.  Unfortunately, it also delayed the video by one field (half a frame).

In the grand scheme of things, half a frame is not a lot.  But multiple passes through video-delaying devices soon followed.  A feed to a network might get synchronized, and then the network’s feed to a station might get synchronized again.  One pass through a digital effects processor might have been used to shrink an image so it fits within a larger one, and another pass might have been used to push both images off the screen.

International standards converters intentionally used longer delays to help with their frame-rate conversion.  Today, there are also up- and down-converters to and from HDTV and 24p.

There was even a video delay caused, during a brief period of madness in U.S. television, by a different timing issue.  When NTSC color was introduced in 1953, there was no specified relationship between the phase of the color subcarrier and the horizontal sync pulse, because it didn’t matter.  When color recorders were introduced, however, that lack of specificity tended to increase the size of the horizontal blanking interval (the period between the end of video at right edge of the picture and its start at the left).

After enough generations of re-recording and editing, the increase could violate FCC regulations (though it was almost never enough to be visible on a home TV).  So, after digital video effects units were introduced that could expand the picture, broadcasters began using them to conform to the regulations.  Pictures got blurry, and sound got out of sync, before the FCC announced that it wouldn’t demand the correction.

All of those video-delaying devices advanced the sound, the worst possible lip-sync problem.  And, initially, there were no matching audio delays.  Some news broadcasts (usually involving frame synchronizers and often adding standards conversion and video effects) started to look as non-synchronous as some Fellini movies.  Today, with audio delays available, there’s no longer any good excuse for lip-sync errors in production and post.

Then there’s distribution, commonly involving MPEG bit-rate reduction.  Presentation time stamps (PTS) are used to lock audio and video together.  Unfortunately, decoders aren’t required to use them, and, if they don’t, lip sync can slip.  If your TV set, cable box, or satellite receiver has slipping lip sync, the best you can do (other than complaining) is change channels and come back; the signal interruption will usually cause the decoder to lock up.  And, if you’ve been watching the same channel for a long time, it might be a good idea to change channels and return before settling in for a movie.

After enough complaints or lost business, perhaps all decoders will someday keep and maintain lip sync.  And it’s certainly possible to make sure any full-picture video delays are matched by audio delays (imaging chips and displays sometimes introduce differential delays between the tops and bottoms of pictures, but they’re very brief).  But then there is space, the final frontier as far as lip sync is concerned.

Light travels so fast that it’s essentially instantaneous.  Sound is a lot slower.  Aircraft have traveled faster than sound; bullets do it a lot.  At nominal temperature and humidity, sound travels a little less than 37 feet in the course of one video frame.

If someone is singing 50 feet away from a microphone (as on an opera stage), the audio will be picked up more than a frame late.  If the sound is then heard in the back row of a movie theater, there will be still more delay.

Harding Inauguration small

Inauguration of President Harding

There’s one way around this.  It’s called visual-acoustic perspective.  When we see someone speaking from a distance, we don’t expect the lip sync to be correct.  That’s why someone sitting three frames away from the stage, hearing a singer two frames behind the proscenium, doesn’t think there’s anything wrong.

Unfortunately, tight lenses can create close-ups, and close-ups make people want tight lip sync, even when it’s physically impossible.  There have already been cases when viewers of live transmissions to cinemas have complained of varying lip sync when all that was happening was cutting between wide shots and close-ups.

Directors of productions shown at large viewing distances should bear that problem in mind.  Otherwise, there’s not much that can be done about acoustic lip-sync issues.  Advancing the sound doesn’t help viewers in the front row.

Otherwise, just make sure all video delays are matched by audio delays.  And complain regularly about decoders not using time stamps.

Tags: , , , , , , , ,

The First Sports Video

July 10th, 2009 | 4 Comments | Posted in Schubin Cafe

For this, my first post on a Sports Video Group blog, I thought I’d look back at the first sports video. Ah, but when was that?

Press coverage of the 2009 book “Rome 1960: The Olympics That Changed the World,” by David Maraniss, sometimes said those were the first Olympic Games to be televised. They weren’t. The 1948 London Olympics were televised, and they weren’t the first, either.


Princeton at Columbia, NBC 1939

Princeton at Columbia, NBC TV 1939

On March 3, 1940, The New York Times noted that “It is becoming increasingly evident to the telecasters in handling athletic events in a wide-playing area such as required by baseball, football and hockey that more than one camera is necessary.” The previous year, on May 18, Louis Effrat reported in the same newspaper that the previous day’s second baseball game between Columbia and Princeton “was televised by the National Broadcasting Company, the first regularly scheduled sporting event to be pictured over the air waves.”

It wasn’t. And never mind even the bicycle race that NBC had begun televising two days earlier.

A football (soccer) match between Arsenal and Arsenal Reserves was televised in England on September 16, 1937.  The Wimbledon tennis tournament was on TV before that.  And amateur boxing had been televised on February 4 of that year. More »

Tags: , , , , , , , , , , ,
Web Statistics