Implementation of Automatic AI Novel Recommendations

PS: Many paid software tools have appeared on the market, similar to tools like Jihu Manjian and SuTui, but their core functionalities are all the same; what still needs to be tested is the effectiveness of GPT. This year, Sora has emerged as an evolved version in this field, which is more likely to impact the film and television production sector (UE4).

Function Design#

Extract storyboard scenes: Sentence segmentation of novel text, SD image generation, and TTS text-to-speech.
Novel content > Derive prompt words (SD painting).
Merge image and audio into video.

Model:
TTS(edge), SD painting model (using: cetusMix_Whalefall2 here), GPT (using Gemini here).

Project address: story-vision

Core Code#

Novel Storyboard Extraction GPT#

prompt = """I want you to create a storyboard based on the novel content, inferring scenes from the original text description; infer and supplement missing or implied information, including but not limited to: character clothing, character hairstyle, character hair color, character complexion, character facial features, character posture, character emotions, character body movements, etc.), style description (including but not limited to: era description, spatial description, time period description, geographical environment description, weather description), item description (including but not limited to: animals, plants, food, fruits, toys), visual perspective (including but not limited to: character proportions, camera depth description, observation angle description), but do not overdo it. Describe richer character emotions and emotional states through camera language, and generate a new descriptive content through sentences once you understand this requirement. Change the output format to: Illustration 1: Original description: corresponding original sentences; Scene description: corresponding scene plot content; Scene characters: names of characters appearing in the scene; Clothing: the protagonist is dressed casually; Location: sitting in front of the bar; Expression: facial lines are gentle, expression is pleasant; Behavior: gently swaying the wine glass in hand. Environment: the background of the bar is dark-toned, candlelight flickers in the background, giving a dreamy feeling. If you understand this requirement, please confirm these five points, and return results containing only these five points. The novel content is as follows:"""

def split_text_into_chunks(text, max_length=ai_max_length):
    """
    Split text into chunks with a maximum length, ensuring that splits only occur at line breaks.
    """
    lines = text.splitlines()
    chunks = []
    current_chunk = ''
    for line in lines:
        if len(current_chunk + ' ' + line) <= max_length:
            current_chunk += ' ' + line
        else:
            chunks.append(current_chunk)
            current_chunk = line
    chunks.append(current_chunk)
    return chunks

def rewrite_text_with_genai(text, prompt="Please rewrite this text:"):
    chunks = split_text_into_chunks(text)
    rewritten_text = ''
    # pbar = tqdm(total=len(chunks), ncols=150)
    genai.configure(api_key=cfg['genai_api_key'])
    model = genai.GenerativeModel('gemini-pro')
    for chunk in chunks:
        _prompt=f"{prompt}\n{chunk}",
        response = model.generate_content(
            contents=_prompt, 
            generation_config=genai.GenerationConfig(
                temperature=0.1,
            ),
            stream=True,
            safety_settings = [
                {
                    "category": "HARM_CATEGORY_DANGEROUS",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_HARASSMENT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_HATE_SPEECH",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                    "threshold": "BLOCK_NONE",
                },
                {
                    "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                    "threshold": "BLOCK_NONE",
                },
            ]
        )
        for _chunk in response:
            if _chunk.text is not None:
                rewritten_text += _chunk.text.strip()
    #     pbar.update(1)
    # pbar.close()
    return rewritten_text

Storyboard Output
Please add image description

SD Text-to-Image#

The prompt for SD is generated by GPT based on the storyboard text output above.

from diffusers import StableDiffusionPipeline
from diffusers.utils import load_image
import torch

model_path = "./models/cetusMix_Whalefall2.safetensors"
pipeline = StableDiffusionPipeline.from_single_file(
    model_path,
    torch_dtype=torch.float16,
    variant="fp16"
    ).to("mps")
generator = torch.Generator("mps").manual_seed(31)

def sd_cetus(save_name, prompt):
    prompt = prompt
    image = pipeline(prompt).images[0]
    image.save('data/img/'+ save_name +'.jpg')

Image Effect
Please add image description

TTS Audio Generation#

There are many TTS options available online; here we use the one provided by edge.

import edge_tts
import asyncio

voice = 'zh-CN-YunxiNeural'
output = 'data/voice/'
rate = '-4%'
volume = '+0%'

async def tts_function(text, save_name):
    tts = edge_tts.Communicate(
        text,
        voice=voice,
        rate=rate,
        volume=volume
        )
    await tts.save(output + save_name + '.wav')

Video Effect#

[video(video-7erojzmT-1713340240300)(type-csdn)(url-https://live.csdn.net/v/embed/379613)(image-https://video-community.csdnimg.cn/vod-84deb4/00b03862fc8b71eebfc44531859c0102/snapshots/0bc4b0ed08a54fc2a412ee3ad1f3fdf2-00005.jpg?auth_key=4866938633-0-0-f335ae8248a7095d7f5d885a25aba80e)(title-Chapter 1: Entering the Station_out)]