AI-powered chatbots have revolutionized human-computer interactions. Following the introduction of Gork by Elon Musk’s xAI, the competition among AI-powered chatbots has intensified. Gork is poised to challenge the dominance of Sam Altman’s ChatGPT and Google’s Bard, setting the stage for inevitable comparisons due to its unique features and capabilities. In initial tests, focusing on middle school math problems and Python coding tasks, xAI reported that Gork outperformed “all other models in its computing category, including ChatGPT-3.5 and Inflection-1.” According to benchmark results, on GSM8k, a metric designed for middle-class math word problems, Gork achieved a score of 62.9%, surpassing GPT-3.5 and LLaMa 2 but falling short of Palm 2, Claude 2, and GPT-4.
Benchmark | Grok-0 (33B) | LLaMa 2 70B | Inflection-1 | GPT-3.5 | Grok-1 | Palm 2 | Claude 2 | GPT-4 |
---|---|---|---|---|---|---|---|---|
GSM8k | 56.8% 8-shot | 56.8% 8-shot | 62.9% 8-shot | 57.1% 8-shot | 62.9% 8-shot | 80.7% 8-shot | 88.0% 8-shot | 92.0% 8-shot |
MMLU | 65.7% 5-shot | 68.9% 5-shot | 72.7% 5-shot | 70.0% 5-shot | 73.0% 5-shot | 78.0% 5-shot | 75.0% 5-shot + CoT | 86.4% 5-shot |
HumanEval | 39.7% 0-shot | 29.9% 0-shot | 35.4% 0-shot | 48.1% 0-shot | 63.2% 0-shot | – | 70% 0-shot | 67% 0-shot |
MATH | 15.7% 4-shot | 13.5% 4-shot | 16.0% 4-shot | 23.5% 4-shot | 23.9% 4-shot | 34.6% 4-shot | – | 42.5% 4-shot |
Gork also successfully completed the 2023 Hungarian national high school mathematics finals with a C grade at 59%, surpassing Claude 2, which scored 55%, but falling behind GPT-4, which received a B grade at 68%. These statistics indicate that Grok-1 already demonstrates greater capabilities than OpenAI’s GPT-3.5 but falls short of the latest model, GPT-4.
The significant difference lies in Gork’s training and inference stack, which is custom-built on Kubernetes, Rust, and JAX. It utilizes a proprietary LLM called Grok-1, trained with real-time data from the X social media platform and web-scraped data. In contrast, ChatGPT relies on the GPT-3.5 or GPT-4.0 LLMs, which are exclusively trained using publicly available internet data.
I think these are early days. Gork will develop as more efficient tool as it develops on available data.