Fine-tuning GPT-3.5-turbo for natural language to SQL (opens in new tab)

(medium.com)

208 pointssaigal2y ago75 comments

75 comments

56 comments · 16 top-level

bob10292y ago· 10 in thread

We've been chasing this rabbit since the beginning. It currently seems to be uncatchable for the use cases that would be most valuable to us - writing complex queries we've never seen before. Our use case seems to confound any notion of training or fine-tuning, since the cases we need the most help with are also the ones we have the fewest examples of.

Instead of going for generative, few-shot models, I am starting to look at the other end of the spectrum: Binary classification into deterministic query building.

With ChatGPT, you cannot realistically explain why you got some output in a way that anyone other than an AI/ML expert would find satisfying. With binary classifiers, you can precisely explain how some input resulted in some output in terms that a business person could easily digest - "You mentioned this table so it assumed you wanted to constrain on XYZ. Here's the trace from the classifiers...".

I've proposed a few schemes where you define groups of classifiers for each SQL building concern - Which tables are involved, which columns, is a join or aggregate implied, general context of business use, etc. Clearly, there are holes with this scheme, but in our domain we could plausibly fine-tune our humans to be a little bit more verbose in their use of the automagic SQL vending machine. Several hours spent training humans is probably a lot cheaper & easier than getting ChatGPT, et. al. to consistently play by our rules.

sillysaurusx2y ago

I think you've fallen into the trap of "AIs don't generalize, they memorize." But they do in fact generalize. The reason ChatGPT is so valuable is precisely because it can help out with situations that have never been seen before, not because it merely unlocks old preexisting knowledge. The fella who saved their dog with ChatGPT comes to mind. https://nypost.com/2023/03/27/chatgpt-saved-my-dogs-life-aft...

PheonixPharts2y ago

I think you have fallen into the trap of mistaking interpolation for generalization.

Working with these models every day, it's clear that they can certainly interpolate between points in latent space and generate sensible answers to unseen questions, but it's pretty clear that they don't generalize. I've seen far to many examples of models failing to display any sense of generalization to believe otherwise.

That's not to say that interpolation in a rich latent space of text isn't very useful. But it's not the same level of abstraction that comes from true generalization in this space.

4 more replies

Zircom2y ago

Did you read the article you posted?

-ChatGPT initially gives the same diagnosis the vets did, Babesiosis

-It notes that Babesiosis may have either been a misdiagnosis or there may be a secondary condition/infection causing the remaining symptoms after the Babesiosis treatment didn't resolve all of them

-It suggests such a hypothetical secondary condition could be IMHA, which the article notes is an extremely common complication of Babesiosis with this specific dog breed

-A quick Google search brings up a fair amount of literature about the association between Babesiosis and IMHA

So in fact this is the opposite of a never before seen situation, ChatGPT was just regurgitating common comorbidities of Babesiosis and the vets in question are terrible at their job.

1 more reply

mejutoco2y ago

> With ChatGPT, you cannot realistically explain why you got some output in a way that anyone other than an AI/ML expert would find satisfying

IMHO OP is talking about "explainability" of the results, which is notoriously bad for current AI. For certain applications (idk if SQL would be one but mortgage application might be one) it is required to be able to explain how the computer got to the decision.

bob10292y ago

> I think you've fallen into the trap of "AIs don't generalize, they memorize."

Binary classifiers don't generalize?

Just because my output is not generative does not mean we are cannot learn / generalize elsewhere. Think of it as a 2-stage process.

lmeyerov2y ago

Yeah louie.ai query generation internals feel more like classic program synthesis, where's it's more about staged compilation, iterative refinement, etc, and we use gpt3/gpt4 for each stage rather than a traditional solver. "Which tables would help this query, which parts should go to text search vs vector search, ...", and we iteratively rewrite and pass along until we reconstitute.

At the same time, we're also moving to make a lot more of that process AI controlled, just across LLM calls (e.g., LLM-dictated ones), so it's a funny maturity process.

Nick_Molter2y ago

Curious to learn more about your use case. If fine-tuning is only ineffective for your most complex queries (and presumably those are less frequent as well, since you mentioned you have few examples), then couldn't you use fine-tuning to handle the simpler queries (presumably the lion's share) and thus free up excess man hours to focus on the more complex queries? Is there any benefit to AI being able to answer 90% of queries vs 0%?

bob10292y ago

These tools are already fantastic at our 80% average case, even without fine tuning. We are seeing some value, but the real pain is in that other 20%.

MuffinFlavored2y ago

> Binary classification into deterministic query building.

I have no clue what this means.

philipodonnell2y ago

I tried something similar to this so I assume it’s to classify a table as being part of the resulting query or not, and using the schemas of the resulting tables in the prompt for the question. I found it just as useful to give it the full schema and ask it for a list of relevant tables, certainly more cost effective.

h1fra2y ago· 10 in thread

The cost per question seems super high. I can't even think of an API where a single call would cost $1cent. You better have a good pricing model to follow up on this.

beebmam2y ago

Really? I sure can. I can even imagine a single API request that triggers a job that costs $100,000. These APIs exist. They're not for individuals.

h1fra2y ago

I mean yes, I have managed BigQuery jobs that cost thousand of dollars per run but the actual call is not the thing that is expensive and I'm in control of what it costs me. It's not exactly the same scenario imo

1 more reply

riku_iki2y ago

> I can't even think of an API where a single call would cost $1cent.

paying engineer for the same job will be many factors more expensive.

jug2y ago

This is basically what put me off when trying to use GPT-3.5 for this with SQL LangChain. And you can’t go cheaper LLM from OpenAI because then accuracy plummeted. GPT-3.5 is the sweet spot.

Sure, some use cases might work but it’s not going to be a thing that Just Works™ for products even accuracy issues aside. There’s just so much data to feed into each and every prompt, schemas and all. Many of them too if you want to enable joins.

claytonjy2y ago

Google Speech-to-Text is one I've dealt with recently; costs over 1¢ to transcribe 1 minute of audio. Almost 8¢/minute for the medical version.

h1fra2y ago

1¢ per 1minute is fair as it would take the best translator much more time than that. 1¢ per sql query seems less fair to me, but we could argue it would also cost more to ask a dev.

1 more reply

Legend24402y ago

What? Some of the APIs I use at work cost $100+ per call. (it's not really about the call, it's about the data it provides)

IanCal2y ago

It's about 0.2 cents right?

It's hard for something you interact with manually to provide positive value less than 0.2c.

thewataccount2y ago

That's for the non-finetuned GPT3.5turbo model.

Finetuned is 1.2cents/1k in and 1.6cents/1k out. So it'll likely be closer to 2cents depending on what you're doing.

I'm not saying it's not useful, at 2c per query you have to be more "purposeful" as they could certainly add up depending on how you use it compared to 0.2c.

1 more reply

Zetobal2y ago

If your app is able to function better for the cost of the API calls the value might just be there.

totalhack2y ago· 4 in thread

I think natural language to SQL is a worthwhile goal and will have its use cases, but perhaps fine-tuning a model on a semantic layer and having it talk to a semantic data modeling API will yield a more useful/practical result. You may even find the semantic layer makes a natural language approach moot in many cases.

In Zillion* I added an experimental feature that uses OpenAI to form a report API call from natural language. Fine tuning (vs prompt tuning) this on the semantic model with gpt-3.5 would probably yield some notable improvement in abilities, as I view it as more of a toy feature at the moment.

* https://github.com/totalhack/zillion

antupis2y ago

Yeah it is literally billion dollar question how to do that querying. Is there some good papers or blog posts that go through different approaches?

totalhack2y ago

I think there are probably a million posts and papers talking about text to SQL as it's a problem that's been tackled for a long time now...just with more vigor recently now that LLMs provide an improved/shiny approach. I think many companies now offer or consider themselves semantic layers in some fashion (google "semantic layer" + tableau, cube.dev, dbt...). I have not come across any blog or paper comparing text-to-sql directly to text-semantic-api. Certainly the latter is not as flexible as the former in that it can't produce every possible SQL query. In practice (for me at least) that's often not necessary, or is necessary rarely so you just do your direct SQL as needed and use your semantic layer >95% of the time.

noorvir2y ago

Can you describe in more detail what you mean by semantic? Doesn’t the DB schema capture the business logic to a large extent?

totalhack2y ago

From our good friend GPT-4 who can summarize it better than I can...

A semantic data layer is an abstraction layer that sits between the raw data stored in databases and the applications or analytics tools that use this data. The purpose of this layer is to provide a consistent, business-friendly interface to data that might be stored in a variety of formats, tables, or systems. By doing so, the semantic data layer helps in simplifying complex data structures into more meaningful, understandable models. Key Features:

    Unified View: It provides a unified view of data from multiple sources, making it easier for users to access and understand the data without having to know the underlying structure or complexity.

    Data Governance: It can enforce business rules, security policies, and data quality measures, ensuring that data is consistent, compliant, and accessible only by authorized users.

    Flexibility: A semantic layer is often designed to be flexible, allowing business users to adapt to changes in the data model without requiring changes to the applications themselves.

    Query Simplification: The layer simplifies the process of querying data by providing a more user-friendly way of accessing and manipulating data, often through a drag-and-drop interface or other graphical tools.

    Decoupling: It decouples application development from data source changes. When underlying data changes, you don't necessarily have to update all the applications that use it; you might only need to update the semantic layer.

    Data Integration: It can integrate data from multiple sources, providing a single "source of truth" for business users and applications.

Purpose:

    Simplification: Make data more accessible and easier to understand for non-technical users.

    Consistency: Ensure that everyone is working from the same definitions and business rules.

    Efficiency: Reduce the time and complexity involved in generating reports, analytics, and other data-driven functions.

    Data Quality: Help to enforce data quality and governance policies.

    Security: Provide a mechanism for enforcing security rules on who can see or modify data.

    Adaptability: Enable quicker adaptation to changes in business needs or data structures.

By providing a semantic data layer, organizations can ensure that their data is not only high-quality and secure but also that it can be easily used for making informed decisions. This is particularly valuable in today's data-driven business environment.

philipodonnell2y ago· 3 in thread

Are you planning on submitting this model to be evaluated against the Spider holdout set?

Also, wondering if anyone has found research on the inverse of this approach to the problem, i.e., instead of training the model to understand the data, you improve the data to be more understandable to a model? This seems more promising when you are looking at enterprise use cases without much training data. Spider seems like quite a simple dataset compared to the ones I encounter on the job, and LLMs struggle even with those.

aazo112y ago

In theory one could create domain specific (or industry specific) templates for data. However coming up with a universal structure might be challenging since data is so varied.

Since the issue is often the context, plugging in data dictionaries (and passing those to the LLM) can help

MrezaPourreza2y ago

Yes, we have already submitted the model for evaluation on the Spider holdout test set. While your suggestion is certainly intriguing, implementing a universal solution could be quite challenging, as it would heavily depend on the specifics of the dataset.

philipodonnell2y ago

I don’t think it’s necessarily about a “universal” solution, just “better”. Make the column names more verbose, changing numeric enums to text ones, disambiguating column names, etc. One of the spider datasets is a stadium table and one of the column names is “average”, which means average capacity, but it’s super ambiguous. If you asked an LLM to “make these table columns more verbose” I bet it would call that “average_capacity” and all of the sudden some NLQ queries that confused the function and the column name would start to work.

codeulike2y ago· 3 in thread

I often do complicated reports in SQL or complicated transformations in SQL for system migrations. To really write a query and 'get it right' you usually need insider knowledge that you can't glean from the column names.

I see in their training set they've got comments about columns too. e.g.

    "Country" TEXT NOT NULL, - country where the singer born

But thats still not enough.

You also need a bunch of information about the real business that the data is describing. And you also need to analyse the whole database - is that field actually used? What are the common values for this picklist? What does that status actually mean in terms of business? If there are two of those rows that match the join but I want to avoid duplicates, which one do I take? - the newest, or the one with a certain status? etc.

codeulike2y ago

Also, with complicated queries or reports, there's quite a big element of "do the results look roughly as I expect?" - e.g does the total income for last month look about right? Then you might do a detailed reconciliation and find out your total is 1% off due to not accounting for refunds and then you go and adjust your query and try again.

ainesh932y ago

> You also need a bunch of information about the real business that the data is describing.

While the article focuses on finetuning GPT-3.5-turbo, how you use the text-to-SQL engine within the architecture of your overall solution is for you to decide. Providing this business context from vectorized context stores in the actual prompt would be a step in the right direction.

capableweb2y ago

Why dont you collect a sample of the data so it knows what to expect then? Include 10 handpicked rows in your prompt and it should be able to pick up the patterns. If you find it doesn't handle something, append that row and restart.

iandanforth2y ago· 3 in thread

Spider isn't anything like the queries that analysts and data scientists write against DBs. I don't think it even has many joins. Do you not have access to a more realistic training set?

philipodonnell2y ago

Agree that it’s not representative of real world queries, but I think it’s more of a “measure progress against a simple but consistent baseline” and SOTA is still struggling so it’s clearly not ready for real queries. Most of the papers on this topic mention a lack of large volumes of high quality training data… Spider is probably the best public one right now.

tmostak2y ago

I agree that Spider queries are not necessarily representative of the SQL you might see in the wild from real users, but looking at some analysis I did of the dataset around 43% of the queries had joins, and a number had 3, 4, or 5-way joins.

ainesh932y ago

Although Spider is better known in the text-to-SQL world, you're right that BiRD may provide a better testing ground. Comparing against the current leaderboard on that standard is on the docket!

TedDallas2y ago· 2 in thread

At a certain level of complexity it is easier to write the damn SQL than it is to explain your query to GPT.

gloosx2y ago

In the end, the one who possesses knowledge will turn it into value going the most obvious and straightforward way: writing a piece of useful SQL or code hot from the head.

GPT is a tool for learners, and they will keep shooting them feet until they learn, just the weapon is different now.

Look at the law system, written in natural language. It is misleading and doesn't work well, so we have to gather thousands of people around courts doing non-deterministic work in order to process them. Natural language is a tool for learning, not making systems. You have to shrink your vocabulary down to code at some point in order to make systems, and you can do it much faster and better than GPT.

saigalOP2y ago

True. The goal would be write the complex SQL query once, then file it away as golden SQL so that in the future a similarly complex query could be done in natural language.

sandGorgon2y ago· 2 in thread

do you not use a vector db and embeddings search to get the table structure, etc ?

aazo112y ago

Most RAG approaches use a vectorDB and embeddings for schema linking. In this case the fine-tuning is handling schema linking and there is no vectorDB.

sandGorgon2y ago

that is super interesting. Do you find that schema-linking using finetuning works better ?

lukev2y ago· 1 in thread

This brings up a question I've had about OpenAI's fine-tuning APIs.

The training format is a series of question/answer pairs: i.e, what we might think of as "supervised" learning. It can be challenging to build good data sets for this scenario.

But transformer-based models actually don't require supervised learning: the bulk training of the GPT family is to just throw masses of text at it and have it do next-token prediction.

What's going on with the difference? When I "fine-tune" GPT-3.5 turbo, am I actually training the transformer, or the RLHF model that sits on top? Or both?

More to the point, is there any way to fine-tune OpenAIs models in a n "unsupervised" fashion? I.e, if I want to teach a model SQL, do I need to get a curated data set of question/answer pairs, or I can I just dump in a bunch of schema and SQL queries that will in theory make the model "better at SQL" in a generic way?

mistercow2y ago

My understanding is that there’s no RLHF model sitting on top of ChatGPT. There’s a separate reward model during training, but that reward model is just used to fine-tune the decoder model. This is sort of confusing in the literature, because they will switch from calling it a “language model” to a “policy”, once they’ve used PPO, but if you trace the citations back to [1], you can see that the policy is initialized with a pretrained language model (in that case, GPT-2).

What I don’t know is how they’re applying RLHF after user fine-tuning. Are they redoing the PPO with the original reward model after tuning on your input? Are they just letting it slide, and hoping that the fine-tuning doesn’t cause the model to forget the RLHF? It’s unclear from what I’ve read.

[1] https://arxiv.org/pdf/1909.08593.pdf

shireboy2y ago· 1 in thread

I think natural language to SQL may not be as great as it sounds in many real-world applications. You often have permissions, tenants, rules about who-can-see-what. I love the idea of letting users use natural language to query and possibly even bulk-update data. But if the app has rules like those, this would just be a SQL injection vulnerability. You could possibly limit it - just allow query over certain views or something - but the risk of the GPT crafting a dangerous query is still high.

What would be more useful IMO is natural language to OData, GraphQL, and OpenAPI/Swagger. Then you could let users do ad-hoc query but only against data they are allowed. I did a PoC using GPT3 to query OData and it was pretty fun, but did occasionally get the query wrong. I also ran into the context window issue. It would get lost when fed larger OData schema.

DevX1012y ago

You could implement row level security on the database to prevent unauthorized access.

tmostak2y ago· 1 in thread

It wasn't clear to me what evaluation method was being used, the chart in the blog says Execution Accuracy, but the numbers that seem to be used appear to correlate with "Exact Set Match" (comparing on SQL) instead of the "Execution With Values" (comparing on result set values). For example, DIN-SQL + GPT-4 achieves an 85.3% "Execution With Values" score. Is that what is being used here?

See the following for more info:

https://yale-lily.github.io/spider https://github.com/taoyds/spider/tree/master/evaluation_exam...

MrezaPourreza2y ago

Hello, thank you very much for your meticulous comment. The 85.3% accuracy reported in our paper (I'm one of the authors of the DIN-SQL paper) pertains to the test set. However, in the blog post, we are reporting the performance on the development set, which stands at 74.2%.

kristianp2y ago

So if they're doing 50,000,000 tokens to do the fine-tune, @ $0.08/1000 tokens and 2 epochs, that's about $800 to do the fine-tune. Not prohibitive from a business standpoint.

saigalOP2y ago

A tutorial on how to fine-tune a GPT3.5 model for Natural Language to SQL tasks and a comparison of its performance vs Retrieval Augmented Generation. Based on the results, fine-tuning can match and outperform RAG (the approach matches the state of the art on accuracy while being far faster and cheaper). The big challenge for fine-tuning tasks like this is building the training datasets. Things should get even more interesting when GPT-4 is opened for fine-tuning.

1 more reply

MrezaPourreza2y ago

I believe this article underscores the significance and efficacy of fine-tuning for specific tasks. Looking ahead, I envision the integration of fine-tuned models with RAG agents and models, further enhancing overall performance.

doitLP2y ago

But still only ~75% accurate execution. Completely unusable and it can’t explain how it got the data it got.

philipodonnell2y ago

Did you provide any db schema information in the prompts after the model was fine tuned?

j / k navigate · click thread line to collapse

75 comments

56 comments · 16 top-level

bob10292y ago· 10 in thread

Instead of going for generative, few-shot models, I am starting to look at the other end of the spectrum: Binary classification into deterministic query building.

sillysaurusx2y ago

PheonixPharts2y ago

I think you have fallen into the trap of mistaking interpolation for generalization.

That's not to say that interpolation in a rich latent space of text isn't very useful. But it's not the same level of abstraction that comes from true generalization in this space.

4 more replies

Zircom2y ago

Did you read the article you posted?

-ChatGPT initially gives the same diagnosis the vets did, Babesiosis

-It notes that Babesiosis may have either been a misdiagnosis or there may be a secondary condition/infection causing the remaining symptoms after the Babesiosis treatment didn't resolve all of them

-It suggests such a hypothetical secondary condition could be IMHA, which the article notes is an extremely common complication of Babesiosis with this specific dog breed

-A quick Google search brings up a fair amount of literature about the association between Babesiosis and IMHA

So in fact this is the opposite of a never before seen situation, ChatGPT was just regurgitating common comorbidities of Babesiosis and the vets in question are terrible at their job.

1 more reply

mejutoco2y ago

> With ChatGPT, you cannot realistically explain why you got some output in a way that anyone other than an AI/ML expert would find satisfying

bob10292y ago

> I think you've fallen into the trap of "AIs don't generalize, they memorize."

Binary classifiers don't generalize?

Just because my output is not generative does not mean we are cannot learn / generalize elsewhere. Think of it as a 2-stage process.

lmeyerov2y ago

At the same time, we're also moving to make a lot more of that process AI controlled, just across LLM calls (e.g., LLM-dictated ones), so it's a funny maturity process.

Nick_Molter2y ago

bob10292y ago

These tools are already fantastic at our 80% average case, even without fine tuning. We are seeing some value, but the real pain is in that other 20%.

MuffinFlavored2y ago

> Binary classification into deterministic query building.

I have no clue what this means.

philipodonnell2y ago

h1fra2y ago· 10 in thread

The cost per question seems super high. I can't even think of an API where a single call would cost $1cent. You better have a good pricing model to follow up on this.

beebmam2y ago

Really? I sure can. I can even imagine a single API request that triggers a job that costs $100,000. These APIs exist. They're not for individuals.

h1fra2y ago

1 more reply

riku_iki2y ago

> I can't even think of an API where a single call would cost $1cent.

paying engineer for the same job will be many factors more expensive.

jug2y ago

This is basically what put me off when trying to use GPT-3.5 for this with SQL LangChain. And you can’t go cheaper LLM from OpenAI because then accuracy plummeted. GPT-3.5 is the sweet spot.

claytonjy2y ago

Google Speech-to-Text is one I've dealt with recently; costs over 1¢ to transcribe 1 minute of audio. Almost 8¢/minute for the medical version.

h1fra2y ago

1¢ per 1minute is fair as it would take the best translator much more time than that. 1¢ per sql query seems less fair to me, but we could argue it would also cost more to ask a dev.

1 more reply

Legend24402y ago

What? Some of the APIs I use at work cost $100+ per call. (it's not really about the call, it's about the data it provides)

IanCal2y ago

It's about 0.2 cents right?

It's hard for something you interact with manually to provide positive value less than 0.2c.

thewataccount2y ago

That's for the non-finetuned GPT3.5turbo model.

Finetuned is 1.2cents/1k in and 1.6cents/1k out. So it'll likely be closer to 2cents depending on what you're doing.

I'm not saying it's not useful, at 2c per query you have to be more "purposeful" as they could certainly add up depending on how you use it compared to 0.2c.

1 more reply

Zetobal2y ago

If your app is able to function better for the cost of the API calls the value might just be there.

totalhack2y ago· 4 in thread

* https://github.com/totalhack/zillion

antupis2y ago

Yeah it is literally billion dollar question how to do that querying. Is there some good papers or blog posts that go through different approaches?

totalhack2y ago

noorvir2y ago

Can you describe in more detail what you mean by semantic? Doesn’t the DB schema capture the business logic to a large extent?

totalhack2y ago

From our good friend GPT-4 who can summarize it better than I can...

    Unified View: It provides a unified view of data from multiple sources, making it easier for users to access and understand the data without having to know the underlying structure or complexity.

    Data Governance: It can enforce business rules, security policies, and data quality measures, ensuring that data is consistent, compliant, and accessible only by authorized users.

    Flexibility: A semantic layer is often designed to be flexible, allowing business users to adapt to changes in the data model without requiring changes to the applications themselves.

    Query Simplification: The layer simplifies the process of querying data by providing a more user-friendly way of accessing and manipulating data, often through a drag-and-drop interface or other graphical tools.

    Decoupling: It decouples application development from data source changes. When underlying data changes, you don't necessarily have to update all the applications that use it; you might only need to update the semantic layer.

    Data Integration: It can integrate data from multiple sources, providing a single "source of truth" for business users and applications.

Purpose:

    Simplification: Make data more accessible and easier to understand for non-technical users.

    Consistency: Ensure that everyone is working from the same definitions and business rules.

    Efficiency: Reduce the time and complexity involved in generating reports, analytics, and other data-driven functions.

    Data Quality: Help to enforce data quality and governance policies.

    Security: Provide a mechanism for enforcing security rules on who can see or modify data.

    Adaptability: Enable quicker adaptation to changes in business needs or data structures.

philipodonnell2y ago· 3 in thread

Are you planning on submitting this model to be evaluated against the Spider holdout set?

aazo112y ago

In theory one could create domain specific (or industry specific) templates for data. However coming up with a universal structure might be challenging since data is so varied.

Since the issue is often the context, plugging in data dictionaries (and passing those to the LLM) can help

MrezaPourreza2y ago

philipodonnell2y ago

codeulike2y ago· 3 in thread

I see in their training set they've got comments about columns too. e.g.

    "Country" TEXT NOT NULL, - country where the singer born

But thats still not enough.

codeulike2y ago

ainesh932y ago

> You also need a bunch of information about the real business that the data is describing.

capableweb2y ago

iandanforth2y ago· 3 in thread

Spider isn't anything like the queries that analysts and data scientists write against DBs. I don't think it even has many joins. Do you not have access to a more realistic training set?

philipodonnell2y ago

tmostak2y ago

ainesh932y ago

Although Spider is better known in the text-to-SQL world, you're right that BiRD may provide a better testing ground. Comparing against the current leaderboard on that standard is on the docket!

TedDallas2y ago· 2 in thread

At a certain level of complexity it is easier to write the damn SQL than it is to explain your query to GPT.

gloosx2y ago

In the end, the one who possesses knowledge will turn it into value going the most obvious and straightforward way: writing a piece of useful SQL or code hot from the head.

GPT is a tool for learners, and they will keep shooting them feet until they learn, just the weapon is different now.

saigalOP2y ago

True. The goal would be write the complex SQL query once, then file it away as golden SQL so that in the future a similarly complex query could be done in natural language.

sandGorgon2y ago· 2 in thread

do you not use a vector db and embeddings search to get the table structure, etc ?

aazo112y ago

Most RAG approaches use a vectorDB and embeddings for schema linking. In this case the fine-tuning is handling schema linking and there is no vectorDB.

sandGorgon2y ago

that is super interesting. Do you find that schema-linking using finetuning works better ?

lukev2y ago· 1 in thread

This brings up a question I've had about OpenAI's fine-tuning APIs.

The training format is a series of question/answer pairs: i.e, what we might think of as "supervised" learning. It can be challenging to build good data sets for this scenario.

But transformer-based models actually don't require supervised learning: the bulk training of the GPT family is to just throw masses of text at it and have it do next-token prediction.

What's going on with the difference? When I "fine-tune" GPT-3.5 turbo, am I actually training the transformer, or the RLHF model that sits on top? Or both?

mistercow2y ago

[1] https://arxiv.org/pdf/1909.08593.pdf

shireboy2y ago· 1 in thread

DevX1012y ago

You could implement row level security on the database to prevent unauthorized access.

tmostak2y ago· 1 in thread

See the following for more info:

https://yale-lily.github.io/spider https://github.com/taoyds/spider/tree/master/evaluation_exam...

MrezaPourreza2y ago

kristianp2y ago

So if they're doing 50,000,000 tokens to do the fine-tune, @ $0.08/1000 tokens and 2 epochs, that's about $800 to do the fine-tune. Not prohibitive from a business standpoint.

saigalOP2y ago

1 more reply

MrezaPourreza2y ago

doitLP2y ago

But still only ~75% accurate execution. Completely unusable and it can’t explain how it got the data it got.

philipodonnell2y ago

Did you provide any db schema information in the prompts after the model was fine tuned?

j / k navigate · click thread line to collapse