Exploring the power of AI for cinematic-style video generation – experiments and insights
In today’s digital age, video is a powerhouse for marketing, captivating audiences and driving engagement like no other medium. Yet creating quality videos tends to be both expensive and time-intensive, and hence often only used sparingly.
With AI revolutionising video generation, it’s crucial to explore how these advancements can support marketing strategies. Recently, OpenAI’s “Sora” and Google’s “Veo” have showcased AI’s ability to create cinematic-style videos, pushing the boundaries of what’s possible. However, currently neither of these tools are publicly available.
In this blog, we’ll dive into the exciting world of AI-driven video generation, focusing on “Dream Machine” by Luma, a new AI tool for creating cinematic-style videos which, unlike Sora and Veo, has just become publicly available. Luma’s Dream Machine allows users to create stunning 5-second AI videos, making cutting-edge technology available to everyone. Join me as I test Luma’s Dream Machine and explore its current capabilities and potential to transform video marketing.
Experiment 1: Comparison between Luma AI video and Sora AI video
My first test was a comparison to Sora. I decided to check how Luma would perform the task of creating the well-known “stylish woman in Tokyo” image that Sora has showcased at its launch in February 2024. I chose this video, as while not without fault, I find it one of the best examples of Sora capability.
However, it should be noted that the Sora video is 60 seconds long, so much longer than the 5 seconds that Luma currently allows. So, it is really only the first 5 seconds that are being compared here.
To ensure the same conditions for the test, I used the same prompt as was used by Sora: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”
The resulting video created by Luma, shown below, was impressive, high quality and convincing and, for those first 5 seconds, definitely at a par with Sora. Due to the detailed prompt, including clear directions on details such as clothing, both videos are in some ways similar, but also show quite different interpretations of aspects that left open to interpretation in the prompt.
Experiment 2: Comparison between Luma AI video and Veo AI video
The next test followed the same pattern, but this time the goal was to compare Luma’s Dream Machine to Google’s Veo. I chose a short Veo video clip which shows someone travelling down a road in a suburban area. This clip is ca. 5-6 seconds long, so this time the clips I was comparing were nearly equal in length.
As before I used the same prompt as used by Veo, a quite short prompt in this case: “A fast-tracking shot down a suburban residential street lined with trees. Daytime with a clear blue sky. Saturated colors, high contrast”
The Google Veo clip, in my view, is medium quality. It has some good details on houses, trees, driveways etc., but there are also some strange aspects to it, and it certainly has an AI feel to it.
Likewise, Luma’s Dream Machine, shown below, was very mediocre with a strong AI feel to it. In particular, the trees and the foliage were extremely poorly done. However, unlike the Veo version, this version included some cars, which added a bit of life and reality.
So, both videos maybe could be used in some contexts, but still show a lot of room for improvement.
Experiment 3: Testing Luma’s capability to generate video localised for the European market
One aspect that had struck me about both of the suburban videos in experiment 2 was that the suburban road in the video was very American and as such unsuitable for a European context.
So, I decided to test, how good Luma would be at creating a localised video and to check how much non-American data its models were based on. I used the same prompt as in experiment 2, but with a slight variation, i.e. I added the words “in Germany or in another European country” to the prompt. Of course, suburban roads within Europe look very different in different European countries. Therefore, by naming one country (Germany), yet allowing it to base its video on other (European) countries, I hoped Luma would be able to produce some results more suitable for at least one European country.
However, the result, which you can see below, was rather bizarre and very unsatisfying. The style of the houses had changed, and you could see some Dutch-style roofs, indeed it seemed the same one roof was repeated several times. No other parts of the houses, such as doors or windows were visible, and the street was completely empty with no cars or people or any signs of life. Electricity poles were of the wooden style you might still find in some rural locations but did not seem appropriate for the modern suburban street shown in the rest of the image.
In sum, the result was very poor and at first sight indicates to me a limited database when it comes to non-American image data.
Experiment 4: Testing Luma’s capability to generate video from a different era
In this last example, I did not compare to Sora or Veo, but wanted to test the Luma capability for a private project I am working on, where I am creating a video, mostly consisting of single photos, as a backdrop to a song. The first verse of the song refers back to times gone by and describes a scene of a child playing at a harbour watching the fishermen mend their nets. My goal was to create a short video clip as background for this first verse.
I provided this information to ChatGPT with the task to help me write a suitable prompt for a video. This is something I like to use ChatGPT for, as visual prompts are quite specific, and I find it helpful to use ChatGPT or other AI tools for help write these prompts.
ChatGPT provided the following quite lengthy prompt, which I then shortened and cleaned up slightly before giving the following prompt to Luma to work on: “A peaceful, nostalgic harbour scene at dusk. A young boy, around 8 years old, plays near the harbour wall. Fishermen are seen in the background, diligently mending their fishing nets, preparing for the upcoming night at sea. The setting sun casts a warm, golden hue over the entire scene, reflecting off the calm waters and giving a serene, almost timeless feel. Key Elements: 1. Young Boy: Casual clothes, perhaps wearing a cap, playing with a small toy boat or skipping stones. 2. Fishermen: A few men with weathered faces, focused on their work, repairing nets with nimble hands. They wear traditional fishing attire. 3. Harbour Wall: Old, sturdy, with a slightly worn look, adding to the nostalgic atmosphere. 4. Setting Sun: Low in the sky, casting long shadows and a golden glow over the scene. 5. Calm Water: Gentle ripples, reflecting the warm colours of the sunset. This should provide a vivid and evocative visual representation to complement the song’s first verse.”
The video below was produced, which I consider quite good:
Yes, there are some inconsistencies in it (e.g. the boat in the background, ropes hanging in the foreground etc.), but overall good, especially the scene, the relaxed play and exploration of the boy, the atmosphere, and the lighting. However, I had forgotten to mention one important fact, namely that the scene plays in Ireland. The choice of a coloured boy therefore did not make sense.
So, I needed to do this again and this time be more specific on location and time. I changed my prompt accordingly, and made several attempts, all with disappointing results and showing different problems that made them unsuitable, e.g. a Hollywood interpretation of what Irish is, modern houses in the background, other unrequested adults being placed into the video detracting from the story, strange interpretation of a ‘playing child’ etc. In sum, unsatisfying but in some cases rather funny, even if unintentionally so.
After several rounds of this, I decided to do one more try, this time using an old photo I had that related to the harbour in the song. The photo showed people including some children sitting on the upper pier wall and thus was suitable. I used the previous prompt, attached the photo as part of the prompt and added the quite open-ended instruction “use this photo as inspiration”.
And this time, the result was quite good. Luma basically used and extended the photo as is and then added a boy in similiar clothing to others in the photo into it and animated the boy. It was not quite what I expected when I said, “use as inspiration”, but the result overall was good. Instead of the Hollywood look, this boy fitted the scene, and it was quite amazing to see the photo brought to life in this way.
One small problem was that in the video created the boy walked extremely fast. The speed issue was easily remedied with the help of another video editing tool (Flixier), reducing the speed to 75%. Other minor issues are the walk of the boy and – at close-up – the hair and facial features. But these are really minor points, and they don’t matter given the purpose of this video.
Practical implications of today’s AI video for business
In conclusion, AI-powered video generation tools like Sora, Veo, and Lumalabs have ushered in a new era of content creation, one that is more accessible, efficient, and cost-effective than ever before. For small and medium-sized enterprises (SMEs), these tools present a unique opportunity to elevate brand presence, engage audiences, and drive results without the need for extensive resources or technical expertise.
Yet, while there has been tremendous progress in this field, it’s important to acknowledge that AI video generation is still evolving, with considerable gaps and shortcomings still to be addressed. However, the progress of the last few months alone has been huge. With the potential for innovation and improvement immense, staying informed about the latest developments is crucial for SMEs looking to leverage this technology effectively.
For businesses ready to explore the possibilities of AI video generation, here are a few action items to consider:
- Watch this space: The field is constantly evolving, with new features and capabilities emerging regularly. Keep an eye on the latest advancements to ensure you’re leveraging the most cutting-edge tools available.
- Experiment and learn: The best way to understand the potential of these tools is to dive in and start creating. Experiment with different platforms, explore their capabilities, and identify their limitations. This hands-on experience wit AI tools and AI video will help you determine the best use cases for your specific needs.
- Consider use of AI video based on context and purpose: While current AI-generated videos may not be suitable for a lot of professional applications yet, they can be a powerful asset for social media content, internal communications, and other less formal use cases. As the technology matures, expect to see even more sophisticated and versatile applications emerge.
By staying informed, experimenting, and strategically incorporating AI tools, including AI video generation into your content strategy, your SME can harness the power of this transformative technology to drive engagement, enhance brand visibility, and achieve your marketing goals.
Here at Glocafy we are committed to harnessing the power of AI to transform business and international marketing, but only where it makes a meaningful difference and aligns with our strict ethical guidelines and AI policy. If you’re ready to explore responsible AI solutions that drive results, reach out to Glocafy today.