AI Breakfast Shanghai

AI Breakfast #18 Notes - Grok4

Executive Summary

At our AI Breakfast #18, our group of entrepreneurs, developers, consultants, and AI engineers discussed topics ranging from AI technologies they're grateful for, challenges of working with AI in China, and misinformation in AI. The attendees also shared the latest on their work and projects, including a reading app for kids called Lumi and AI agents for assessing speaking skills in schools from Communication Intelligence Labs.

Member Projects

One attendee, a founder building a platform to automate business operations, showcased progress on a side project called Lumi, a reading app designed to help children practice German pronunciation. The app features short stories where kids read aloud, and it analyzes their speech for accuracy, speed, and completeness using tools like Microsoft's speech recognition. It even breaks down individual words, allowing users to practice tricky pronunciations, and generates images for the stories using AI tools like Nano Banana Pro.

Another participant, who founded Communication Intelligence Labs, demonstrated their AI agents that evaluate students' speaking abilities in classrooms. By analyzing video recordings of presentations, the system scores skills across categories like content, organization, English proficiency, voice, and body language, using a proprietary rubric. It provides detailed feedback, timestamps issues in the video, and is customized for schools, with pilots in North America and Vietnam, aiming to expand in China.

These projects highlight how AI is making education more accessible, from self-paced language practice to automated assessment tools that help teachers track student progress without manual grading.

AI Technologies We're Grateful For

The group shared various AI features they're thankful for, starting with simple voice-to-text tools that make everyday tasks like setting alarms or note-taking effortless. Real-time translation in devices like smart glasses impressed others, enabling seamless communication across languages, such as subtitling conversations in Chinese environments.

Image generation tools like Nano Banana Pro sparked excitement for allowing non-artistic individuals to visualize ideas quickly, while coding assistants like GitHub Copilot were praised for transforming how people learn and build software. UI/UX design aids, such as Vercel V0, were highlighted for generating complete designs from brief descriptions, complete with color schemes and layouts, proving invaluable for developers without design skills.

Discussions also touched on AI's role in language learning, where tools supplement study but raise questions about whether perfect translations might reduce the need to learn new languages at all.

Challenges of Working with AI in China

Attendees discussed the "China tax" – the time lost fiddling with VPNs and proxies to access blocked tools like Claude or Google services, which disrupts workflows and requires constant workarounds. One shared experiences with models like Composer in Cursor, noting its speed but potential future access issues, while others mentioned setting proxies for terminals and Docker to maintain connectivity.

Fundraising for AI edtech in China proved tricky, with geopolitical tensions making investors wary of mixed US-China operations, and advice centered on proving revenue in international schools before scaling. The conversation revealed how data restrictions and the need for local servers complicate building global products from China.

Historical context emerged too, with long-term residents recalling China's internet evolution post-2008 financial crisis, when blocks on sites like Google intensified, shifting the landscape for tech work.

Reading, Learning, and AI Assistance

The talk turned to reading skills, debating the value of vocalizing text for better pronunciation versus speed reading techniques that skip internal vocalization for faster comprehension. Attendees shared personal experiences, like using AI to accelerate learning or creating plugins to convert visual information into audio for easier absorption.

OCR tools for processing PDFs, especially those with charts and images, were explored, with D-Ocr from Xiaohongshu praised for accurately segmenting pages into text, images, and LaTeX. However, challenges with grainy documents and the need for custom setups highlighted the ongoing hurdles in digitizing knowledge.

AI's potential to scale education, like folding proteins with AlphaFold or rebuilding knowledge bases like Wikipedia through tools like Grok's Rockopedia, was seen as a game-changer for accessing niche or archived information.

Misinformation in AI

Concerns about misinformation on social media led to discussions on how AI models might ingest false data from bots or biased sources, potentially spreading inaccuracies. Attendees noted efforts like Grok's Rockopedia to create unbiased alternatives to Wikipedia, emphasizing iterative fact-checking and transparent policies.

The group reflected on Wikipedia's strengths in crowd-sourced editing but vulnerabilities to manipulation, contrasting it with AI's ability to scale content creation across topics, though digitizing physical archives remains a barrier to comprehensive knowledge.

Other Resources

  • Voice-to-Text on Mobile Devices: Built-in feature for dictating text and commands on phones. Attendees appreciated its simplicity for tasks like setting alarms, noting it as an "old school AI" unlock for productivity.
  • Real-Time Translation in Smart Glasses: AR glasses with microphone arrays for subtitling conversations. Used effectively in multilingual settings like offices in China, with praise for accuracy despite some errors.
  • Nano Banana Pro: AI image generation tool for creating visuals from prompts. Grateful for enabling non-artists to express ideas, though prompting techniques are needed to avoid default styles.
  • GitHub Copilot: AI-powered coding assistant integrated into development environments. Transformed learning to code, with one attendee crediting it for entering the field after initial hesitation.
  • Vercel V0: Tool for generating UI/UX designs from text descriptions. Helped a developer without design skills create color schemes and layouts quickly, serving as a strong starting point for projects.
  • Microsoft Azure Speech Services: Speech recognition API used in apps for pronunciation analysis. Provided detailed feedback on reading speed and accuracy in the Lumi app, though alternatives like Whisper require additional processing.
  • D-Ocr: Open-source OCR tool for PDFs, excelling at segmenting images, text, and LaTeX. Best for handling documents with graphs, running affordably on platforms like Hugging Face despite slower speeds for large files.
  • Claude: AI model from Anthropic for various tasks. Valued for productivity but frustrating when inaccessible due to regional blocks, highlighting dependency on such tools.
  • Grok's Rockopedia: AI-generated knowledge base aiming to rebuild Wikipedia iteratively. Seen as impressive for scaling unbiased content, though debates arose on handling controversial topics compared to human-edited sources.
  • AlphaFold: Protein structure prediction tool from DeepMind. Revolutionized biology by folding all known proteins in a year, exemplifying AI's power to scale human efforts exponentially.
← Back to Notes