Griffin McCauley (MSDS ‘23) Takes on LLM for Long Text Sequences

Laura Simis
August 30, 2023
Image
Griffin McCauley
Griffin McCauley (MSDS '23)

Recent M.S. in Data Science graduate Griffin McCauley (MSDS '23) spent his summer building and testing a novel LLM for long text sequences.

This week Hum, the leading provider of AI & data intelligence solutions for publishers, announced the open source release of their LLM –  Lodestone. As of August 2023, Lodestone is the highest performing model of its size and sequence length on the MTEB Leaderboard.

Able to process approximately 3000 words – an 8x increase over the most commonly used models published by Google, Facebook, Microsoft, and LLM researchers – Lodestone is capable of contextualizing an entire research paper, surfacing content insights that previous models couldn't comprehend looking at just one or two paragraphs at a time. This makes it particularly compelling for real-time applications on long text where larger models may be prohibitively expensive or slow.

McCauly completed his capstone project with Hum earlier in the year, assisting the team in laying the groundwork for Lodestone’s model. After his graduation, the Hum team approached him about continuing his work over the summer while he searches for a full-time position.

McCauley was an integral part of the build, working closely with Lead Software Engineer, Dylan DiGioia and Lead Data Strategist, Will Fortin. They used Google's BERT architecture as a foundation, leveraging several improvements that have been released since then to develop a novel model for processing longer text sequences.

He also took the lead on model training, working with a large publicly available dataset, including over 1 million scholarly research articles and publications.

"We’re grateful for Griffin’s contributions," said Dustin Smith, Hum’s Co-founder and President. "This was our first foray with new techniques like FlashAttention, Attention with Linear Biases (ALiBi), and Gated Linear Units (GLU), which allowed Lodestone to attain its processing limit of 4096 tokens - roughly the size of an academic article.”

Developers and enterprises can fine-tune and deploy their own models on Hugging Face, putting long-sequence AI applications in reach of more projects and businesses.

“This model might only live in our imaginations, if not for his hard work bringing it to life,” said DiGioia.