Public BigQuery

Fast and flexible queries of all Lens data in bulk and on demand.

Introduction

Lens already has indexers, which snapshot the data and transform it into a relational, super easy-to-understand database. This is how we get our speed on queries and how we can power all the LENS apps with the API. This saves the data into a centralised postgres database and allows you to query it using graphQL API. This is perfect for apps which need data on query time; for example, a user who is logging in wants to see their feed or see someone's profile. Querying this through the API is perfect and fast. The complication comes when you wish to bulk pull data, say you wish to do some ML training, some analytic, general data profiling. Right now, you would need a dedicated endpoint to surface that information, and of course, us creating these endpoints, which maybe you are only interested in. This does not scale. On top of this, having all this data and not allowing leverage for builders to do stuff feels wrong.

With this in mind, we have published the entire Lens social graph to a public BigQuery dataset. This means you can query all the data in bulk, whatever you want, in any way you wish, to any demand. Public BigQuery datasets give you a 1TB allowance every month, but if you want more, just add some GCP credits. This information should be accessible by anyone without them having to write the complex task of indexers and have all that infrastructure set up. This solves that, bringing you the entire LENS graph in the cloud to query anytime you wish.

What can I do with this?

You can pull huge data to do analytics say, a creator's analytics dashboard. You can pull huge amounts of day to run some ML code on it to showcase the discovery of publications. You can build your feeds with curated information very easily. This is the first step in allowing anyone to build custom features on top of LENS without the API has to do it for you. Think outside the box. You have the data. Go and build something epic with it.

How to query it

You should read this to understand how you query public data sets from BigQuery https://cloud.google.com/bigquery/public-data. You can do it from the google cloud console or any BigQuery client libraries, which support mostly all languages. The easiest way if you want to play around with it is using the Analytic hub, as you can run queries directly through it.

We have three public data sets to query from:

  • mumbai - lens-public-data.mumbai.schema_table
  • polygon - lens-public-data.polygon.schema_table
  • sandboxlens-public-data.sandbox.schema_table

please note you MUST query this from the US region. If you try to use the EU region it will not be able to find it

How it works

It is very simple how it works. The diagram examples the flows:

The data will always be 15 minutes behind. This is the fastest setting you can have turned on with the data syncer.

schema

We will dive into the DB setup for you; you can also generate an ERD on BigQuery to see how things lined up. The below will summarise all the tables and their usage; it won't go into full detail about their column types etc., as you can see in BigQuery for that.

All examples above have to include the schema in the query. For example, to query ens address reverse records, it would be

lens-public-data.polygon.ens_address_reverse_record

To see most available tables, you can run the following query:

SELECT table_name
FROM `lens-public-data.polygon.INFORMATION_SCHEMA.TABLES`

ens

This holds the reverse record for an address mapped to the ENS; it only looks at wallet addresses which own a profile.

proof_of_humanity

We index all owners of profiles to see if they have POH registered.

sybil_dot_org

We index all owners of profiles to see if they are part of the Sybil dot org

worldcoin

We index any worldcoin human verification mapped to the profile id. The last seen block info table is just the state the indexer cares about.

public

The public schema holds most of the logic most people will care about.

app stats

Apps can have their stats, aka what you did on lenster, for example. These tables do counters for actions on each app.

approval

As we explain in the docs within the API, when you transfer a follower NFT to someone, it could look like you follow them when you did not. To avoid that, we have approval tables in which when someone transfers a follower NFT to a wallet, it goes into an awaiting approval state until it is shown as you follow them.

module whitelists

Throughout LENS, you have different types of modules to collect, reference and follow. These modules have to be whitelisted at the current time. These tables hold the mappings to those modules.

publication/follow NFT ownership

After you collect/follow, that NFT is still tradable. These tables keep the owner status up to date even after the collection/follow has happened.

follower

When you follow people, it's the wallet which follows, not profiles, so these tables keep that state in order. Please note it only writes once to the follower table, even if someone has followed that profile 100 times with the same wallet. A wallet can follow many times.

hashtags

You can hashtag words in publication content, and these tables' job is to keep it current.

indexed transactions

This table holds all the transaction hashes it has indexed with the block info attached to it

last seen block info

The indexer uses blooms to work out if it needs to check the block, and if that block has the information it should care about, that means it does not check every single block. This table has the last block it did check; it is used if the indexer ever goes down, and it can pick up from where it left off.

mentions

You can @people on LENS and attach their handle; these tables are in control of keeping the state updated about the mentions happening on the platform.

notifications

Notifications often happen throughout the platform, and these tables keep all the notification information in sync.

profile

A very important factor of LENS is profiles. With this, you have many factors which hold state around the profile:

  • profile_dispatcher - what the dispatcher set for the profile
  • profile_follow_nft - where the profile follow NFT address is deployed (if any as lazy loaded)
  • profile_interests - any interests the profile has attached
  • profile_curated - curated profiles we know great good content to allow our explore curated API to work
  • profile_follow_module_details - the follow module details are currently set for the profile
  • profile_follow_module_details_history - a change history of follow modules set for the profile
  • profile_follow_module_follow_record - a list of follow modules redeemed
  • profile_nft_gallery - the profile NFT galleries context
  • profile_recommended_dismissed - the profiles that this address is not interested in following
  • profile - the main profile
    • is_metadata_processed - is a way to see if the profile metadata the user updated was completed
    • nft_* - is if they have an NFT linked to their profile picture

posts, mirrors and comments

Publications are the core heart of LENS. These tables keep all that information up to date.

Please note just because it is in here does not mean it was successful in following metadata standards. For successful publication the s3_metadata_location should be defined, the has_error should be false and the is_metadata_processed should be true.

Mirrors are in the profile_post table if it has an is_related_to_post or is_related_to_comment then it is a mirror.

publication stats

A list of all publication stats.

protocol state

The protocol state can change. This keeps track of it

publication access control

This holds the LIT access control logic around a publication. If it is pending, the access control has not been applied to a publication yet.

publication collect/reference modules

These tables hold the collected and reference module details for a publication. It also holds the records of all the collects which have been done on that publication.

reactions

You can react to publications these table store that state.

publication pointer

Maps a publication ID to a publication type.

publication tags

These tables hold the publication tags which have been applied

reorg block status

This marks blocks as safe and monitors the reorgs.