I recently ran into a situation where I needed SQLAlchemy to connect to an existing database (well, actually, several hundred) with overlapping table names, but each table had user changeable schemas. Some tables had 20 columns and some had a few hundred. Defining the schema in models wasn’t going to work.
SQLAlchemy supports the ability to discover the schema, but most tutorials ended up with read only functionality as the Row objects were simply tuples. After much digging, it turns out SQLAlchemy can autodiscover the table structure while supporting CRUD operations. This is what the example code looks like:
import os from sqlalchemy import MetaData, create_engine, Table, Column, Integer from sqlalchemy.orm import sessionmaker from sqlalchemy.ext.automap import automap_base class DBConnector(): """ Instantiates a connection to a supplied database and supports CRUD operations """ def __init__(self, database_name): """ Establishes the connection parameters :param database_name: name of target database :type database_name: str """ self.mapped_tables = ["LIST OF TARGET TABLE NAMES"] #self.metadata = MetaData() self.Base = automap_base() self.db = create_engine( 'mssql+pyodbc://%s:%s@%s/%s?driver=ODBC+Driver+18+for+SQL+Server&TrustServerCertificate=yes' % ( os.getenv('DB_USER'), os.getenv('DB_PASSWORD'), os.getenv('SQL_HOST'), database_name)) self.connection = self.db.connect() for table_name in self.mapped_tables: table_obj = Table(table_name, self.Base.metadata, Column('ID', Integer, primary_key=True), autoload=True, autoload_with=self.db) self.Base.prepare() sqa_tbl = self.Base.classes[table_name] self.__setattr__(table_name, sqa_tbl) Session = sessionmaker(bind=self.db) self.session = Session()
So, a few important things on the above:
- You need to know the tables you want to map. This could be updated as a variable. In my use case, it was the same across all couple hundred.
- The most important thing is that you need to specify the primary key column otherwise automap_base will not instantiate the attributes. This is not necessary if the primary key is defined but for some reason in the DBs I was working in, they weren’t.
- The above is not threadsafe so use separate processes or leverage a pool.
- The above should probably have a __repr__ defined that helps identify the database it’s attached to.
- To prevent namespace collisions, it may be beneficial to
TestDB = DBConnector('test_db') result = TestDB.session.query(TestDB.TABLENAME).all()