Blog Post

Small Data SF 2024

,

I can’t remember how I heard about Small Data SF 2024, but it caught my eye. The mix of sessions had me interested in going, especially with Mother Duck and Duckdb being the main sponsors. I’ve run into DuckDb a few times in the last couple of months, so I was interested in what I could learn about small data and a different group of people than I normally see at events.

When a customer visit cancelled, I requested the learning and development (L&D) time and budget and got it approved. I booked flights and a hotel and headed to San Francisco.

The structure of the conference was interesting to me. I’ve been lucky to get to a few smaller conferences in the past (100-200ppl) and I like them. SQL Bits, PASS Data Community Summit, and other events are nice, but I tend to like small events.

This conference had the tag line of Think small, develop locally, ship joyfully. There were other tags, and you can read their manifesto, but essentially this conference looked at the idea that lots of work with data (OLTP or analytics) can be done on small sets, with local databases or local data.

Day 1

Day 1 started late, at 12p with lunch. I liked that, though I took advantage of the late start to sleep in and get a late breakfast, so I really wandered around and chatted with people with a coffee. Food was nice, as it always is in San Fran. Lots of dietary choices, and mixes of stuff. The event was in a co-working facility, so there were always snacks around (chips, nuts, fruit, etc.).

Day 1 was two workshops. Each was 3 hours with a break between them. Essentially these were vendor sessions for hands on work with a product. There was a happy hour after, but I skipped it.

My first workshop was from Mother Duck, a vendor building on DuckDB. The workshop was based on this github repo, and showed how to use dbt to move some data around. It was hands-on, and things worked well for me, but this was a mix of CLI work, python, database work, and more. Some people definitely struggled with the workshop. I found this interesting, and I learned a few things. I’m definitely interested in doing some DuckDB work to analyze data in a way that is different (and simpler) than Snowflake or Fabric. I could see people doing this.

The second workshop was from Outerbase, which essentially is a way to work with multiple databases on the web. It’s a light Object-Explorer/Query tool in some ways, but they’re also trying to do some AI work to help stub out a web interface for your database. They had us try to build some methods and web code that we could paste into a React or Angular (or others) framework. This one was OK, but I am not sure this is a great use of AI. I was hoping for a bit more.

Day 2

Day 2 was all day, from 830-530. I arrived to find a lot of people getting breakfast. Again, hot food, cold food, GF, etc. Lots of choices. One cool thing was a coffee bar where you could get baristas to make a nice drink, but you could also get a Mother Duck mug for your drink. I have too many mugs, but I liked this one, so I got one.

2024-09_0189

This was a one track conference, which I also like. Everyone gets a shared experience, we have common things to talk about, and things change often. I also don’t have to go find rooms. In this case, one large rooms with a low stage.

2024-09_0191

New talks every 20 minutes, on a variety of topics. The agenda was wide and varied. I think a few talks were meh, but most were interesting. I’ve got some editorials coming, but the first talk on Big Data was great, as was the second one on different tooling we might use for both development and analysis of smaller sets of data.

Note, small data doesn’t mean kb or less. It notes that many queries can be run on GB of data on a laptop, and with today’s network and laptop capabilities, this can make sense. There was also some limited domain views of the ways you might shard your data to lots of databases, and you might do more local work, not central db connections. That makes sense in some cases, but not all.

I also think some of the speakers (quite a few startup people) minimize or don’t think about the true scale problems when workloads grow, nor about the hassles of pulling all this data together and synching it. In any case, I think their ideas work for some problem domains.

There were a few panels, as well as a presentation of a paper from Amazon. One super interesting thing was Redshift shows like a 60:40 split of reads to writes. That seems crazy. However, an exec from FiveTran talked about that matching their experience where many data warehouses are running constant updates from OLTP systems, something that their customers sometimes don’t realize. He wasn’t sure if this was a good idea as well, but it’s been good for their business.

As seems to be the case, there was a satirical talk on BI tools and how they don’t always help. An analyst for one of the political campaigns gave a funny humorous look at the world of vendors and customers.

After the last talk, there was a short happy hour, where I had the chance to chat with a few people. Silicon Valley is a strange place, full of people working in startups, formerly from startups, or wanting to start one. Everyone has a good idea, which I think is true, and so many of them want to chat about their thing or your thing.

As you might expect, a lot of people at AI-focused or thinking AI. It’s neat to hear their experiences and what they think. Certainly I saw some neat demos or using small models (again small data) and feeding a user query into the model along with some data from a database or a flat file. That was interesting and something I think could be useful in different ways that are focused. I expect more and more people to get comfortable with AI based work.

Ultimately, I had a nice, refreshing two days that got me thinking about data differently and how there are different ways to approach problems and solutions. Perhaps one of the neater things I saw was PySheets, Python in spreadsheets. Just don’t try it in Chrome, and make sure to use the little A* button to test the AI.

Original post (opens in new tab)
View comments in original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating