I-ArtPrompt: i-jailbreak ekuvumela ukuba udlule izihlungi ze-AI usebenzisa imifanekiso ye-ASCII

ArtPrompt

Indlela ye-ArtPrompt

Los inkqubela kuphuhliso lobukrelekrele bokwenziwa iyanda y ifuna umaleko ongaphezulu wokhuseleko ukuthintela abantu abaneenjongo ezingendawo ukuba bangazisebenzisi kakubi ezi zixhobo eziye zaba ngamakrele antlangothi-mbini.

Kwaye ekuphuhliseni ii-LLM ezisetyenziswa kuluhlu olubanzi lwezicelo, ukhuseleko alusekho ukukhetha, kuba ngamaxesha amaninzi siye sabona ukuba ukusetyenziswa kakubi kunjani.

Nangona zonke ezi ndlela zobuchule ziphunyeziwe, iingxaki ziyaqhubeka zivela ezo zifunyenwe ngaphakathi kwedatha yoqeqesho, ekuthe krwada kuqala akukho nto ingaqhelekanga okanye iyingozi ngaphandle kokuqwalasela ezinye iinguqulelo ezinokwenzeka zedatha.

Isizathu sokukhankanya oku kukuba kutshanje Ulwazi lukhutshwe malunga nohlaselo olutsha olubizwa ngokuba yi "ArtPrompt", oko ikukuthi Thatha inzuzo yokunciphisa ii-AIs ekuqapheliseni imifanekiso ye-ASCII ukudlula imilinganiselo yokhuseleko kunye nokuqalisa ukuziphatha okungafunekiyo kwiimodeli.

Olu hlaselo lwalunjalo ifunyenwe ngabaphandi abavela kwiiyunivesithi zaseWashington, Illinois naseChicago, kwaye bakhankanya ukuba "i-ArtPrompt" yindlela yokudlula izithintelo kwiingxoxo ze-intelligence intelligence ezifana ne-GPT-3.5, GPT-4 (OpenAI), Gemini (Google), Claude (Anthropic) kunye neLlama2 (Meta).

Le ndlela yokuhlasela ibaleka ngezinyuko ezimbini kwaye ngolo hlobo ithatha ithuba lokuqondwa ngempumelelo kombhalo ofomathiweyo we-ASCII. Inyathelo lokuqala ibandakanya ukuchongwa kwamagama kwingxam enokubangela ukwaliwa ukuphepha izihluzi ezibona imibuzo eyingozi kwaye okwesibini loo magama agutyungelwe kusetyenziswa ubugcisa be-ASCII ukwenza i-camouflaged prompt, ngaloo ndlela ilawula ukuphembelela iimpendulo eziyingozi kwimodeli.

Ukusebenza kweArtPrompt kwavavanywa kwii-chatbots ezintlanu, ebonisa amandla ayo okudlula ukhuselo olukhoyo kunye nokuphumelela kwezinye iintlobo zohlaselo lwasejele. Ukuvavanya amandla e-chatbots ekuqapheliseni imibuzo kwifomu yobugcisa be-ASCII, "Umngeni we-Vision-in-Text Challenge (VITC)" icetywayo njengebhentshi.

Lo mngeni ufuna ukuvavanya ukukwazi kwemodeli ukutolika nokuphendula imibuzo esebenzisa ubugcisa be-ASCII, ebonisa ukuba iiLLMs zinobunzima bokuqonda imibuzo emele unobumba omnye okanye inombolo enobugcisa be-ASCII. Ukuchaneka kweemodeli kuncipha kakhulu njengoko imibuzo iqulethe abalinganiswa abaninzi, iveza ubuthathaka kwisakhono se-LLMs ukucubungula ulwazi olubonakalayo olufakwe ngekhowudi ngale ndlela. Ukongeza, olunye uhlaselo kunye nokuzikhusela ngokuchasene nokuqhekeka kwejele kwii-LLMs ziyaphononongwa.

Kuyakhankanywa ukuba I-ArtPrompt ibonakala isebenza ngakumbi kunezinye iindlela ezaziwayo njengoko ifumene umgangatho ophezulu wokuqatshelwa kwemizobo ye-ASCII kwiimodeli ezifana ne-Gemini, i-GPT-4 kunye ne-GPT-3.5, kunye namazinga okudlula ngempumelelo okucoca i-100%, i-98% kunye ne-92% ngokulandelanayo ekuvavanyeni. Ngokubhekiselele kwizinga lokuphumelela kohlaselo, i-76%, i-32% kunye ne-76% yabhalwa, kwaye ubungozi beempendulo ezifunyenweyo zavandlakanywa kwi-4,42, 3,38 kunye ne-4,56 amanqaku kwisikali sesihlanu, ngokulandelanayo.

I-ArtPrompt igqamile kolunye uhlaselo lwasejele lokwakha imiyalelo eyingozi njengoko ifuna inani elikhulu lokuphindaphinda, ngelixa i-ArtPrompt ifezekisa eyona ASR iphezulu phakathi
zonke uhlaselo jailbreak kunye iteration enye. Isizathu kukuba i-ArtPrompt inokwakha ngokufanelekileyo iseti ye-covert prompts, kwaye iyithumele kwimodeli ngokuhambelanayo.

Ukongeza, abaphandi ibonise ukuba iindlela zokucoca eziqhelekileyo ezisetyenziswayo ngoku (Thenga amagama kunye nokuphinda kuphawulwe kwakhona) azisebenzi ekuthinteleni olu hlobo lohlaselo ebizwa ngokuba yi "ArtPrompt". Okubangel 'umdla kukuba, ukusetyenziswa kwendlela yokubuyisela kwakhona kwandise inani lezicelo eziqhutywe ngempumelelo, zigxininisa imfuneko yokuphuhlisa izicwangciso ezintsha zokujongana nezi ntlobo zezoyikiso xa usebenzisana ne-chatbots.

I-ArtPrompt igqamile ngokukwazi kwayo ukudlula ukhuseleko olukhoyo kwaye abaphandi bakhankanya ukuba kuya kuqhubeka nokusebenza ngempumelelo ekuhlaseleni imodeli yolwimi lwe-multimodal, nje ukuba iimodeli ziqhubeka nokuthatha imifanekiso njengegalelo, ukubhidanisa imodeli kunye nokuvumela i-ArtPrompt ukuba ibangele ukuziphatha okungakhuselekanga.

Gqibela Ukuba unomdla wokwazi okungakumbi ngayo, ungajonga iinkcukacha kwi ukulandela ikhonkco.


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Inoxanduva lwedatha: I-AB Internet Networks 2008 SL
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.