Chinese tech firms’ ‘digital humans’ who look and sound like you: What are their uses and pitfalls?

Earlier this month, I attended Tencent’s Global Digital Ecosystem Summit in Shenzhen, where I got to experience their Digital Human showcase at the Shenzhen World Exhibition & Convention Center.

Along with other international media — including journalists from Malaysia and Indonesia — we were introduced to a digital version of Mr Dowson Tong, CEO of Tencent Cloud. Big crowds gathered to watch a virtual version of Mr Tong appear on a large screen and deliver a presentation in three languages: Chinese, English, and then Bahasa Indonesia.

Then it was my turn (I was quite eager to try out the technology).

With my phone, I scanned a QR code which directed me to Tencent’s website where I submitted a recent photo, a 30-second voice recording, and 100 words of text paragraph for my digital human to say.

I also had the freedom to choose from nine different languages for my digital human: Chinese, English, Korean, Japanese, Arabic, Bahasa Indonesia, Thai, French, or German. There was even an option to switch the voice to a different gender, which I found fascinating.

Unfortunately there were technical difficulties. Despite help from Tencent staff, I wasn’t able to generate my digital human on site so the process had to be completed remotely once I returned to Singapore.

I had help from the team at Tencent Cloud and Smart Industries Group. I provided a one-minute voice recording, along with a 30-second video. I was told that my face and mouth had to be clearly visible throughout. And for fun, I decided to recite tongue twisters —making sure my face moved in sync with my words.

Before the final step, I was required to provide consent so I recited: “I, Melody Chan, am aware that recordings of my voice will be used by Tencent Cloud to create and use a synthetic version of my voice.”

When the final product arrived, it was startlingly lifelike, almost a mirror of the video I had submitted. The voice, while slightly robotic, was impressive nonetheless. Hearing myself speak fluently in languages I don’t know, like Thai, Arabic, and French, was surreal.

However, there were a few small glitches. The sentence breaks felt a bit unnatural, and if you paid close attention, you could spot my hand gestures repeating themselves.

In the end, creating my own hyper-realistic digital doppelganger was undeniably fun, but also a little dystopian. It felt like something straight out of Black Mirror.

Seeing ‘Digital Melody’ come to life made me wonder… how long before AI comes for our jobs?